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ON THE POSTULATE OF THE ARITHMETIC MEAN 


By Ricumonp T. Zocu 


Introduction 


Suppose n observations have been made of an unknown quantity. It is de- 
sired to know the most probable value of the unknown. When Gauss gave his 
development of the so-called Normal Law of Error, he assumed that the Arithmetic 
Mean of the n observations is the most probable value. The question arises: Can 
this postulate be justified? 

In the excellent book, entitled “Calculus of Observations,” by Whittaker and 
Robinson! there is given a proof which purports to deduce the postulate of the 
Arithmetic Mean from assumptions of a more elementary nature. This proof 
is not correct. 

Since this book has had wide circulation, it is believed that the errors in this 
proof should be called to the attention of the users of the book. The present 
paper has been prepared for this purpose. The first part of this paper points 
out the questionable features of the proof given in Whittaker and Robinson’s 
book. The second part gives some critical comments on the original sources 
from which Whittaker and Robinson obtained their proof. 


Part 1 


The assumptions on which Whittaker and Robinson based their proof of the 
postulate of the Arithmetic Mean are: 

Axiom I. The differences between the most probable value and the indi- 
vidual measures do not depend on the position of the null-point from which 
they are reckoned. 

Axiom II. The ratio of the most probable value to any individual measure 
does not depend on the unit in terms of which the measures are reckoned. 

Axiom III. The most probable value is independent of the order in which the 
measurements are made, and so is a symmetric function of the measures. 

Axiom IV. The most probable value, regarded as a function of the individual 
measures, has one-valued and continuous first derivatives with respect to them. 

It is fairly easy to show that if the Arithmetic Mean is the most probable 
value, then the above four axioms follow as conclusions. The converse, viz. if 
the above four axioms be assumed then the Arithmetic Mean is the most prob- 
able value, however, is not true. That is to say the above assumptions are 


1 The Calculus of Observations by E. T. Whittaker and G. Robinson, Blackie & Son, Ltd., 
London (1929), pp. 215-217. 
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necessary conditions, but not sufficient conditions. For, consider the following 
function of the measures: 


where Z is the Arithmetic Mean of the z;. 

Clearly this function is a symmetric function of the measures (x;) and there- 
fore satisfies Axiom III. If the x; are each multiplied by k& then the Arithmetic 
Mean (Z) is also multiplied by k and we have 


+ > ( (ka; — kz)? 
nN {=1 wae: 


1S he, — key 
t=1 
that is to say, if we multiply the individual measures by k it is the same as multi- 
plying the function S by k and therefore the ratio of any individual measure 
be 


2 
to the most probable value (function) does not depend on the unit used. Hence 


the function “ satisfies Axiom II. 
Me 


The partial derivative of “ with respect to 2 is 
M2 


(2 —ale12—aF(- Bh +998 
12-26 -a}{-B}+96-98)) 


: p> bi or} _ Bula — 2)* ~ pal — Quales — 2) 


2 
Nye 


t=1 


since - = *, and >> (4; #) =0. The partial derivatives of 3 with respect 
1 i=1 Me 


to each of the 2; are of the same literal form and clearly these partial derivatives 


are single valued and continuous. Therefore the function “ satisfies Axiom IV. 
He 


Now it can be shown that if h be added to each x;, then the function * — > is 
unchanged and hence this function does not satisfy Axiom I. (It ui "' 


noted that the function “ is invariant under the transformation specified by 
be 
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Axiom I.) However, consider the function # + a = f, where a is a constant 
2 


independent of the z;. Clearly, f satisfies all of the four axioms. 

Thus a function, distinct from the Arithmetic Mean, has here been exhibited 
which satisfies the four axioms given in Whittaker and Robinson’s book. Hence, 
these four axioms are not sufficient to establish the postulate of the Arithmetic 
Mean. ‘The question arises: Where is the proof given by Whittaker and Robin- 
son lacking in rigor? The proof given is essentially asfollows. (No part of the 
proof given by Whittaker and Robinson is here omitted; in fact, for the sake of 
rigor and careful reasoning, further explanations are given and the various steps 
are numbered.) 

(1) Suppose the most probable value is expressed in terms of the n measures 
41, 2, +--+ » Xn by the function (x, re, --- , Xn); that is to say the most probable 
value is some function, ¢, of the observations, or: the most probable value 
= o(n, —***, In). 

(2) By the theorem of the mean value in the differential calculus, which by 
Axiom IV is applicable, we have $(kx, kao, --- , kan) = 


$(0, 0, --- , 0) +e | | 4 es +e, | % |, 


az Orn 


where the square brackets denote that every x; is to be replaced by 6kx; where @ 
lies between 0 and 1. 

(3) By Axiom II, the left hand side = k@(a1, x2, +--+ , 2n). 

(4) By the continuity of ¢, postulated in Axiom IV the equation 
o(ka:, kaa, --- , kan) = k(x, x2, --+ , n) must hold in the limit when k is 0, 
that is ¢(0, 0,---, 0) = 0. 

(5) We now have 


ko(a1, 2; ee Xn) = kay BH + as + ke [+]. 
or on dividing by k, 
$(x1, 22, eles » tad = z| | Se +20]. 


02, O2n 


(6) In this last equation let k — 0: then each of the quantities | tends 
to a value which is independent of the z’s and we can write $(%, 22, --+ , Zn) = 


(121 + --- + €,2, where the c’s are independent of the z’s. 
(7) By Axiom III the c’s must all be equal, so 


(21, La, +++ Ln) = CX + T2 +--+ + Tn). 
(8) From Axiom I we have 


O(a + h, te + hy --- tn +h) = $(%1, T2, +++, Xn) +A. 
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(9) If in this last equation we let the x; all approach zero then we have cnh = h 


and therefore c = ~ and finally 


$(21, Z2, +++ , Tn) = i (timate +m) 


which states that ¢ = the most probable value = the Arithmetic Mean. 

It should be noted that the first six steps involve only Axioms II and IV. Of 
these first six steps the second and sixth ere questionable. 

The sixth step involves the tacit assumption that the partial derivatives are 
functions of k. These partial derivatives are not necessarily functions of k and 


the example given above, viz, f=Z+a 3 is a function whose partial deriva- 
Me 


tives are independent of k; in fact no function of the form 


will satisfy the tacit assumption involved in the sixth step; nor is F the most gen- 
eral function which will not satisfy the tacit assumption, thus take for example 


zm ™ a 
F as = aes m 
Digby + Cus 


Consider now the second step. Take the function (yi, y2,---, Yn) = 
ko(a1, 22, --- ,2n). Then, by Axiom II, we have y; = kx;. Apply the Theorem 
of the Mean Value to ¢(y;) instead of ¢(z;). Then $(y1, yo, --- , Yn) = 

0g 0g 

0,0,--- ,0 a oa sa he 
> ( )+n | | + +y | 2+ 
we obtain the equation given in the second step except that the square brackets 


are now of the form E= a --: es | and not | | as given by 
a(kzx;) Ox. 


Now if we replace y; by ka; 





t 


Whittaker and Robinson. It is difficult to decide whether by | | Whittaker 


+ 


and Robinson mean 
Ee kts, ---, be | sa E— Te, ++: a 
02; Oz; ; 


These last two expressions dre not equal. To make the second step more clear 
it is necessary to demonstrate that 


Ee kxe, ---, kee | 7 E Xe, -- +a | 
a(kz;) 7 02; . 
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and this has not been done. In order to demonstrate this equality further use 
must be made of Axiom II. It appears that the questionable features of the 


second step may be overcome by starting with the equation implied by Axiom 
II, thus 


(kai, kx, ae kxn) =_ ko(x1, i Tn); 


in other words ¢ is a homogeneous function of degree 1. Therefore use can be 
made of Euler’s Theorem on homogeneous forms. In this way we obtain: 


t=n 


o= Dae 


i=1 


which is an abbreviation of the last equation given in the fifth step. 
Now, making further use of Axiom II we have: 


O¢(kai, kt, ---,ktn) = 1 a 


0(ka;) — a(ka;) k(x, op 2 y In) = Se (x, Ta, -**y Xn) P 


It follows that 


OG(a1, X2, +--+, %n)  AG(kay, kaa, --+ , kan) 


02x; a(ka;) 


From this development we conclude that for any function whatever which satis- 
fies Axiom II the last equation of the fifth step cannot possibly involve k. 

In order to overcome the defect in the sixth step it is necessary to make a more 
restrictive assumption. If in place of Axiom IV, we assume that “The most 
probable value, regarded as a function of the individual measures, has first partial 
derivatives with respect to them which are constant,’’ then the equation given in the 
sixth step can be rigorously established. 

After the equation of the sixth step is rigorously established there remains an 
objection in the seventh step. The axioms do not explicitly state that the n 
observations must be functionally independent. Therefore suppose the z; are 
functionally dependent according to the relation x; = yz where the y; are all 


constant. Then the function f = # + a will have partial derivatives with 
2 

respect to the x; which are unequal and constant; yet at the same time the 

function f is a symmetrical expression of the n variables. 

Hence in order to establish the postulate of the Arithmetic Mean along the 
lines followed by Whittaker and Robinson it is necessary to make another restric- 
tive assumption slightly different from that proposed in the last paragraph but 
one, and assume (in addition to Axioms I and II) that the function has partial 
derivatives with respect to the x; which are equal. 
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Part 2 


The first original paper consulted was one by Schiaparelli.? In this paper nine 
propositions are presented four of which are also called lemmas. From a strict 
mathematical point of view the four propositions which Schiaparelli calls lemmas 
are really postulates. Schiaparelli discusses these four lemmas at ler gth; three 
of these lemmas are the first three axioms given in Whittaker and Robinson’s 
book. The fourth one is: ‘““When, in the function ¢, all the variables (x;) take 
the same value a, the function itself becomes equal to a,”’ (This, as a matter of 
fact, is the definition of an average). 

In his discussion of these lemmas, which are based partly on practical and 
partly on philosophical grounds, Schiaparelli points out that they are justified 
from the practical or statistical nature of the problem involved in arriving at the 
most probable value (Schiaparelli uses the term “true value’) of a set of obser- 
vations. In the present writer’s opinion, these discussions are the most excel- 
lent part of Schiaparelli’s paper. These discussions are even more significant in 
view of the fact that the later writers on this subject make no attempt whatso- 
ever to justify the use of their postulates. 

Schiaparelli remarks that we should have no reason for not expecting that a 
small change in a single observation should produce a small change in the func- 
tion ¢; but he does not make this remark in the form of an explicit postulate. 
This could have been done and, moreover, such a postulate of continuity could 
be justified from the practical nature of the problem. It seems that a more 
elegant procedure would have been to deduce the continuity of the function and 
its derivatives from AxiomsI and II. It will be shown later that this is possible. 
From his remark on the continuity of the function, Schiaparelli concludes that 
the partial derivatives of ¢ with respect to the x; exist and are continuous. His 
method of arriving at this conclusion is not valid, for it is well known that an 
arbitrarily assumed function may be everywhere continuous and yet possess a 
derivative at no point. 

Schiaparelli’s Proposition III states: ‘‘When in the function ¢ all the 2; take 


the same value, then the ae become equal to each other.”” This Proposition is 


false. To show this, consider the function 


= 7 Ms 
meee. 


where the 


af 


3y2[(ai —%)? — we] — 2us(xi — @) 
02; ; 


1 
°—" mus 


2 Giovanni Schiaparelli—Come si possa giustificare |’uso della media aritmetica nel cal- 
colo die risultati d’osservazione, Rendiconti Reale Instituto Lombardo di Scienze e lettere, 
Vol. XL (1907), pp 752-764. 
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Now, when the z; all approach a then both f and a become indeterminate 


+ 
forms. However, in this case f takes an indeterminate form which can be 


, 
Me 


evaluated and it can be shown tha will always have the value zero, i.e., f 


will have the value a when all the x; = a; while the f can take any value 


whatever and in general the a will not be equal when the xz; a. To illus- 
trate: Consider the observations y; = 1, ye = 3, ys = 4 then 7 = 8/3 and 
ue = 14/9 and uw; = —20/27 whence f = 8/3-10/21. Now assume that these 
three observations all approach 2 in a certain way, i.e., let 7; = 2+ (y; — 2)z. 


Then ¢ = 2 + (9 — 2)ze = 2 + (2/8)z. 
pales) = 22> (ys — 9)? = (14/9) 
and 
uals) = 2 (ys — 9)? = (—20/27)24 


whence f = 2 + (2/3)z — (10/21)z. Clearly as z — 0 the x; + 2 and f — 2. 


However, 
x] 1, 131 
O21 _}z,—2+4(1.-2)2 294 ’ 


a 
| _ 1 _ 253 
OX2_lz—2+-(y2—2)2 3 294 ’ 
_! 
~ 3 


of | 122 
OX3_lzy—2+(ys—2)2 


294 ° 
‘ of : af , 
Thus the are not functions of z and as the z; — 2 the rey remain constant 
and unequal. 


From his conclusion that the derivatives of @ exist and from Axiom I, Schia- 


-t 


parelli obtains the equation, >> oe w 1, (this equation being his Proposition 
t=1 t 

V) in the following way: Since the derivatives of ¢ exist, then by the Theorem 

of the mean value, 


(a1 + h, 2 + h, xs + h,--+ ,%n + h) 


= Hey 20-2) + a(H 4% 4 + 8). (A) 


0X2 OLn 


By Axiom I: 
o(t1 + h, x2 +h, +--+, n+ h) = $(n1, Tz, +++ ,%n) +h. 
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Whence >> = 1. Now this equation is correct but the above proof of it 
t=1 zt 

is not convincing. Clearly, according to the Theorem of the mean value, in 

equation (A) it is necessary to replace each z; in the - by 6x; where @ is 


between 0 and 1. 


t 


Schiaparelli’s Proposition VII states in effect that the are invariant under 


t 


° / . . e,° 
the transformation x; = x; + h where h is constant, and his Proposition IX 


states that the ~ are invariant under the transformation +; = kz; where k is 
a constant. These two propositions are correct and are correctly established. 
Making use of his Propositions III (which is false), V, VII and IX, Schiaparelli 
proceeds to the establishment of the postulate of the Arithmetic Mean, as 
follows: 

Let a = $(z;). As the x; vary, then a varies but for a particular set of 2; 
then a is a constant. Now by Axiom I we have 


a+ (m—1)a= $(% + (m — 1)a, re + (m— 1)a, ---,22n+(m—1)a)=ma 
for all values of m > 1. Then by Axiom II: 


ang (Ht@—De mtm—ve ... tee) 


m , ™m : m 


no (HS 40B=840,..., 2840). 


m 7m 


And by Propositions VII and IX, the = are unchanged during the above trans- 
formations. Hence the last equation is true when m— and by Proposition 
III (false) the = = * as when m —> ~, $(x;) = a. In this final proof Schia- 
parelli gives a geometric illustration of each step. 

It is both interesting and strange to know that in closing his paper Schia- 
parelli does not claim that the Arithmetic Mean is the only function which 
will satisfy all of his postulates. In fact he himself points out that the func- 
tion ¢, implicitly defined by the equation Zz. (@ — x:)"™ = 0 where mis an 

i=1 
odd integer > 1 will satisfy all of his postulates. Furthermore he points out 
that this function will not satisfy his Proposition III. Schiaparelli’s object 
was to establish the postulate of the Arithmetic Mean without any appeal to 
the concept of probability. To accomplish this he made four assumptions each 
of which he justified by a priori reasoning. Then he proceeded with the above 
proof. Why he should have been satisfied with his own proof after perceiving 


the function defined by }> (¢ — 2:)™ = 0 is hard to understand. 
t=1 
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The second paper*® consulted was also by Schiaparelli. It is merely an 
abridged form of the one just discussed. Schiaparelli wrote two earlier papers on 
this same subject (altogether Schiaparelli wrote four papers on it) but it was 
inferred from the footnotes in his paper, which has just been discussed at length, 
that it contained all of the material of the two earlier papers with which he him- 
self was satisfied. Therefore Schiaparelli’s two earlier papers were not con- 
sulted. 

The third paper consulted was that by Broggi.* Broggi states that the pur- 
pose of his paper is to establish the postulate of the Arithmetic Mean by purely 
analytic methods which are more brief than Schiaparelli’s method. Broggi 
words the assumptions upon which he bases his proof as follows: 

1. ¢isasymmetric function of its n variables; 

2. The partial derivatives are single-valued and finite; 

3. We have $(kai, kate, --- , kan) = kb(a1, 2, «++ Ln); 

4. We have ¢(a; + h, ve +h, --+ ,¢n +h) = O(%1, 22, --- , 2n) +h, that is 
to say for 2: 





=1. (a) 


Broggi does not explain why he used the postulate 2 but presumably it was in 
order to exclude the function defined by p (@ — x;)™ = 0. Consider the 


t=1 
special case where m = 3. Then nd* — 3¢? D2; + 36 Zz} — Te? = 0. Let 
p= 3 ¢ *) and q = = tr? — 2 — = zat. Also put R = 
(p/3)® + (q/2)? and let A be the real cube root of — q/2 + ~/R and B be the 
real cube root of — g/2 — ~/R. Then the three branches of ¢ can be explicitly 
written 


ll 


a=A+B+E 
do = wA+0*°B+4+ £ 
¢d3 =WwA+oB+Z 


where w and w? are the two complex cube roots of unity. Now while ¢ does not. 
satisfy the postulate that the function be single valued, ¢; satisfies this postulate 
as well as all the others and so does ¢2 and also ¢3. Hence, Broggi’s failure to 


i=n 


comment at length on the function >) (¢ — 2,)" = 0 is unsatisfying. Asa 
t=1 


matter of fact Broggi fails to point out any of the defects of Schiaparelli’s 


3 Giovanni Schiaparelli—Come si possa giustificare ]’uso della media aritmetica nel 
calcolo delle misure, senza fare alcuna ipotesi sulla legge di probabilita degli errori acci- 
dentali, Astronomische Nachrichten, Band 176 (1907) pp. 206-212. 

4 Ugo Broggi—Sur Le Principe De La Moyenne Arithmetique, L’ Enseignement Mathe- 
matique, XI (1909) pp. 14-17. 
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paper, with the possible exception that he shows Schiaparelli’s postulate which 
states ¢ = a when each of the x; = a to be a consequence of Axioms I and II. 
This is done so casually that it makes one wonder whether Broggi really was 
aware of the fact that Schiaparelli’s postulates are not independent. 

Broggi proves the Lemma: ‘‘A homogeneous function of the first degree which 
is a solution of the equation of partial derivatives (a) is an integral function.” 
This Lemma is correct and is correctly proved but its wording is apt to be mis- 
leading; in fact it appears that its true meaning was not clear to Broggi himself. 


For, while the function ¢ cannot be of the form ¥ where y is a homogeneous 
xX 


function of the p degree which satisfies Axiom I and x a homogeneous func- 
tion of the (p — 1) degree which also satisfies Axiom I, the Lemma does not 


mean and Broggi has not proved that ¢ cannot be of the form ¢ = Q + v where 
x 


Q is an integral function satisfying Axioms I and II and y¥ and x are homogene- 
ous functions of the p and (p — 1) degrees respectively which are invariant 
under the transformation specified in Axiom I. By reason of this oversight, 
Broggi concludes that any function satisfying Axioms I and II must be linear 
in its n variables, a conclusion which is erroneous. 

The fourth paper consulted was that by Schimmack.' Schimmack’s paper is 
in three sections. The first section contains the proof which is essentially that 
which Whittaker and Robinson give. In the second section Schimmack gives a 
different proof, from a set ef new postulates. The new set of postulates is: 

Axiom I’ = Axiom I. 

Axiom II’—The most probable value is independent of the sense of direction 
of the scale upon which the observed values (and the most probable value) are 
reckoned, that is to say, 


¢(—%, wey OG —2n) — — (x, ag Zn). 


Axiom III’ = Axiom III. 

Axiom IV’—If from n observed values, the most probable value be computed 
and if one obtains an additional observed value then the most probable value of 
the n + 1 observed values is the same as the most probable value of n + 1 
quantities consisting of the initial most probable value counted n times and the 
(n + 1)* observed value, namely: 


Gn4i(X1, es a Tn+1) = Gn+i(Pn, hie dn, Tn41)- 


In explaining the object of this second section, Schimmack says that postulat- 
ing the existence of the derivatives (Axiom IV) seems unjustified and ought to 
be avoided and only such axioms made which the intrinsic character of the prob- 
lem justifies. In connection with this statement of Schimmack’s it appears that 
the intrinsic character of the problem certainly does not justify Axiom IV’. In 


5 Rudolf Schimmack—Der Satz vom arithmetischen Mittel in axiomatischer Begriin- 
dung, Mathematische Annalen, Band 68 (1909) pp. 125-132, 304. 
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fact, Axiom IV’ appears to be quite artificial. Moreover, Schimmack does not 
attempt to justify Axiom IV’ by a priori reasoning as Schiaparelli does for 
Axioms I, II, and III. While, if the Arithmetic Mean is the most probable 
value, Axiom IV’ follows, since it is a property of the Arithmetic Mean, it does 
not seem to be in keeping with the intrinsic character of the problem to use this 
property as a starting point for later deductions. 

As regards Schimmack’s objections to Axiom IV, all of the conditions specified 
by it can be deduced from the first two Axioms except that the derivatives must 
be single-valued. To show that this is true, consider an arbitrary function 
which satisfies Axioms I and II. Let this function be $(%, x2, --- , 2%). Wedo 
not know that ¢ is continuous or that ¢ has any derivatives. All we assume is 
that ¢ satisfies the first three Axioms and it is here proven that ¢ must be con- 
tinuous and have continuous partial derivatives. By Axiom I we can give 
increments to the z;; hence we give each zx; the same increment, Az, and then 
subtract ¢ and we have: (21 + Az, 22 + Az, +--+ ,%n + Ax) — O(41, 22, +++, 2n) = 
Ad but by Axiom I, A¢ = Az. Therefore f =l= - In other words, the 
total derivative of ¢ exists and is constant. Therefore the total derivative of 
¢ is continuous. But since the total derivative exists, all of the partial deriva- 
tives exist. By Axiom II, ¢ is a homogeneous function of the first degree. 


Applying Euler’s Theorem for homogeneous forms, we have ¢ = 2 c + oe 
1 


+--+ +2, . Since the total derivative of @ is everywhere continuous, 


n 


¢ is also everywhere continuous. Thus, the right hand side of the above equa- 
tion is everywhere continuous and each partial derivative is therefore everywhere 
continuous. 


As regards that part of Axiom IV which requires the - to be single valued, 
1 


it would seem more satisfactory to postulate that the function ¢ is single-valued, 
for the single-valuedness of a derivative does not insure the single-valuedness 
of the integral while the single-valuedness of a function does insure the single- 
valuedness of the derivative where the derivative exists. 

In the third section of his paper, Schimmack shows Axioms I, II, III, and IV 
to be independent, and likewise Axioms I, II’, III and IV’. 

Schimmack does not mention any of the questionable features of Schiaparelli’s 
and Broggi’s papers. 

The fifth paper consulted was that by Suto. Suto’s assumptions are: 

1°. (2, x, --- , x) = x (This is Schiaparelli’s). 

2°. $(21 + Yr, T2 + Y2, +++ y Xn + Yn) —~ $ (x, me e9*% Tn) depends on the 
values of y1, Ye, «++ , Yn Only. 

3°. = Axiom III = Axiom III’. 


6 Onosaburo Suto—Law of the Arithmetical Mean, Tohoku Mathematical Journal, Voi. 
6 (1914) pp. 79-81. 
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Suto says he believes these assumptions to be more simple and natural than 
Schimmack’s Axioms I’-IV’. However, assumption 2° appears to be quite 
artificial and very restrictive. Suto does not even attempt to justify it by a 
prior: reasoning. 

Suto shows his three Axioms to be independent. It is interesting to know that 
Suto has established the postulate of the Arithmetic Mean rigorously using only 
three postulates while Schiaparelli, Broggi and Schimmack failed using four 
postulates. In this connection it should be observed that when Axiom IV as 
given by Whittaker and Robinson is replaced by ‘“‘The most probable value, 
regarded as a function of the individual measures, has first partial derivatives 
with respect to them which are equal’ as suggested at the end of Part 1, then 
Axiom III can be deduced as a consequence of Axioms I, II and the reworded 
Axiom IV, so that three Axioms only are sufficient to deduce the postulate of the 
Arithmetic Mean. However, it would be difficult to justify the reworded Axiom 
IV from the nature of this problem of the Arithmetic Mean. 

Suto does not point out any of the defects of the preceding papers. 

The last paper consulted was that by Beetle.? It deals with the third section 
of Schimmack’s paper. Beetle also fails to point out any of the defects of the 
preceding papers. 


Conclusion 


The postulate of the Arithmetic Mean can be rigorously established, without 
the use of the concept of probability, if sufficiently restrictive assumptions are 
made. The writers making sufficiently restrictive assumptions have failed to 
justify the use of them. Several proofs of the postulate of the Arithmetic 
Mean are clearly erroneous. The existing attempts to establish the postulate of 
the Arithmetic Mean without any appeal to the concept of probability are, 
therefore, unsatisfactory. 
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THE SHRINKAGE OF THE BROWN-SPEARMAN PROPHECY 
FORMULA 


By Rosert J. WHERRY 


At the recent meeting of the Conference on Individual Psychological Differ- 
ences held in Washington, Dr. Clark Hull of Yale University called attention to 
the fact that the much used Brown-Spearman formula involves, or leads to, if 
used without regard to certain limitations, a certain over optimism.! In other 
words, if only this formula is taken into account, one would assume that the mere 
increasing in length of a test would automatically and, with continued increases 
in length, indefinitely continue to increase its reliability or validity. 

On the other hand, we know that the greater the number of test units the 
greater the shrinkage between the predicted and actually obtained value. At 
least we know this to be true when the value in question is a multiple correlation 
coefficient and the test units are independent variables. Hull raised the question 
as to whether or not the same fact might be true of the figures predicted by the 
Brown-Spearman formula. It is the purpose of this article to show that this 
shrinkage does occur, and that the Wherry-Smith shrinkage formula? satisfac- 
torily predicts this shrinkage. 

A quick review of the nature of the two formulae (the Brown-Spearman and 
the Wherry-Smith formulae) will at once show the importance of the discussion. 


The Brown-Spearman formula, as applied to the predicting of reliability, reads 
as follows, 


M T11 


R= TM —Dn ws 


where R = the predicted reliability, 
ry, = the discovered reliability, 
and M = the number of times the test is lengthened. Thus the test provides 
that the predicted reliability (R) will increase with each increase in M, but it is 
to be noted that the increase in R decreases with each increase in M as the value 
of R approaches its limit of plus one. 
On the other hand the Wherry-Smith formula, which reads, 


(N — 1)R? — (M — 1) 
a ” 


where & = the predicted value of the correlation, 
R = the discovered correlation, 
M = the number of independent variables 
183 
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and WN = the statistical population (the number of cases), provides that, for 
each increase in M, the shrinkage in R as compared with R increases. Thus, if 


TABLE I 
Correlations Observed and Theoretical (Based upon Observed Means) 
(N = 37 throughout) 


Shenns Correlation predicted Error 
eal Br.-Sp. | Wherry Br.-Sp. 
(Trait Oe sis 


.290 
.728 
717; 
.754 
.805 
.936 


a a ait 


TEAL 


.845 
.887 
.877 
.876 


Tit) 10) 


.354 
.479 
717 
.852 
.636 
.805 


(All Traits) 


1 .320 
20 .898 .904 822 .006 .076 
30 .872 .933 .576 .061 . 296 


we assume that the M’s in the two formulae are analogous, i.e., if we assume the 
Wherry-Smith formula to be applicable to the Brown-Spearman formula, we 
see that as M increases the Brown-Spearman formula adds a decreasing incre- 
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ment while the Wherry-Smith formula provides that an increasing decrement be 
subtracted, thus eventually we arrive at a point where by further increasing the 
length of the test we will decrease rather than increase the size of the reliability 
coefficient. 


If our hypothesis be true, we must, then, in order to predict the correct value 
of R, substitute the value of equation (1) in equation (2). Doing this we have 
RK (N — 1)M?*r}, — (M — 1)*rj, — 2(M — 1)’r,, — (M — 1) 
— ’ 
(N — M)[{1 + 2(M — 1)ry, + (M — 1)*rj,) 
which would then be the form in which the Brown-Spearman formula should be 
used in predicting reliability corrected for chance error by the Wherry-Smith 


TABLE II 
Error in Predicting Reliability (Based upon Observed Means) 





(3) 


Error Brown-Spearman Wherry 


over .210 
.151-  ——-.210 
.091-— .150 
.031- .090 
.029- .030 
.089-— — .030 
.149- — .090 
.209- — .150 
below — .209 


TABLE III 
Rietz Criteria of Normality Applied to Results from Means 


Criterion Normal Brown-Spearman Wherry 


ui 0 .074 — .032 
B; 0 561 — .283 
Be 3 2.008 3.180 


formula. The same result can of course be secured by applying the formulae 
consecutively. 

In order to test the formula (3), the writer has applied it to some empirical 
data. <A recent article by H. H. Remmers of Purdue University furnishes the 
needed data. Remmers study dealt with the increase in reliability due to in- 
crease in the number of judgments of certain traits of college professors. His 
results, together with the results of applying formula (3) to the data are shown 
in Table I. 

An inspection of Table I shows at once that while the Brown-Spearman 
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see that as M increases the Brown-Spearman formula adds a decreasing incre- 
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the statistical population (the number of cases), provides that, for 





AUABBE 


Observed 
average 


(Trait ite 


.290 
.728 
.717 
.754 
.805 
.936 


—  —— «=F s—“OsSSS 5) 


.419 
. 736 
.845 
.887 
.877 
.876 


Trait) eee 10) 


.354 
.479 
717 
.852 
.636 
.805 


(AN Traits) SS 


.320 
.898 
.872 





TABLE I 
Correlations Observed and Theoretical (Based upon Observed Means) 
(N = 37 throughout) 


Correlation predicted 


Br.-Sp. 


.671 
.803 
.860 
.891 


25 


783 
.878 
.915 
935 
. 956 


.733 
.846 
.892 
.915 
.943 


.904 
.933 


Wherry 


.751 
.834 
.856 
.856 
.745 


.692 
. 788 
.816 
.822 
.655 


822 
.576 


Br.-Sp. 


— .057 
.086 
. 106 
.087 
— .011 


.047 
.033 
.028 
.058 
.080 


.254 
.129 
.040 
.279 
. 138 


.006 
.061 


Error 


Thus, if 


Wherry 


—.110 
.009 
.004 
.020 

— .427 


.015 


— .131 


.213 
.071 


. 186 
— .150 


— .076 
— .296 
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ment while the Wherry-Smith formula provides that an increasing decrement be 
subtracted, thus eventually we arrive at a point where by further increasing the 
length of the test we will decrease rather than increase the size of the reliability 
coefficient. 


If our hypothesis be true, we must, then, in order to predict the correct value 
of R, substitute the value of equation (1) in equation (2). Doing this we have 
pe - (N — DM rh — (Mt — 1)'rh — 200 — 1)%ry — 
= 5 ’ 
(N — M)[1 + 2(M — 1)ry, + (M — 1)?r5,] 
which would then be the form in which the Brown-Spearman formula should be 
used in predicting reliability corrected for chance error by the Wherry-Smith 
TABLE II 
Error in Predicting Reliability (Based upon Observed Means) 


(3) 





Error Brown-Spearman Wherry 
over .210 2 1 
.151-—.210 1 
.091-— ~—«.150 3 
.031- .090 8 1 
—.029- .030 3 6 
— .089-— — .030 1 3 
— .149- — .090 3 
— .209- — .150 
below — .209 2 


TABLE III 
Rietz Criteria of Normality Applied to Results from Means 

















Criterion Normal | Brown-Spearman Wherry 
U1 0 .074 — .032 
Bi 0 .561 — .283 
Bo 3 2.008 3.180 





formula. The same result can of course be secured by applying the formulae 
consecutively. 

In order to test the formula (3), the writer has applied it to some empirical 
data. <A recent article by H. H. Remmers of Purdue University furnishes the 
needed data. Remmers study dealt with the increase in reliability due to in- 
crease in the number of judgments of certain traits of college professors. His 
results, together with the results of applying formula (3) to the data are shown 
in Table I. 

An inspection of Table I shows at once that while the Brown-Spearman 
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formula gives results which are consistently too large (15 out of 17 times) the 
Wherry-Smith formula gives results which are more nearly equally distributed 








TABLE IV 
Correlations Observed and Theoretical (Based upon Observed Medians) 


(N = 37 throughout) 





























Obs Correlation predicted Error 
M hong 
a Br.-Sp. | Wherry Br.-Sp. | Wherry 
(Trait 1) 
1 .344 
5 .752 .724 .682 — .028 — .070 
10 .663 .840 .779 177 .116 
15 .702 .887 .807 .185 .105 
20 .805 .913 .805 .108 .000 
30 936 | 940 635 004 ~ 301 
(Trait 5) 
1 450 | 
5 . 760 | .804 .776 .040 .016 
10 .856 .891 .852 .035 — .004 
15 .931 .925 .873 — .006 — .058 
20 .877 .942 .874 .065 — .003 
30 .876 | .961 .778 .085 — .098 
(Trait 10) 
1 .363 
5 .433 .740 .701 .307 . 268 
10 754 .851 .795 .097 .041 
15 .872 .895 .822 .023 — .050 
20 .898 .919 .820 .021 — .078 
30 .872 945 .669 .073 — .203 
(All Traits) 
1 . 503 
20 .898 .953 .879 .055 — .019 
30 .968 .829 


between positive and negative errors (7 to 10), tending to slightly underestimate. 
The actual distribution of errors can be more easily seen by an inspection of 


Table IT. 


.872 


.986 


— .043 
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Now, if our formula were perfectly correct, we should expect that the errors 
incurred by its use would be normally distributed about a mean error of zero. 
The Rietz criteria for normality of distribution were applied to these errors with 
results as shown in Table III.4 It can be readily seen that the Wherry correc- 
tion formula gave much better results than did the uncorrected Brown-Spearman 
formula when measured by the Rietz criteria. 

All of the results in the first three tables are based upon the means of the 
results obtained by Remmers, since this was the method used in his paper. 
However, when the number of cases is small, as they were in this study, it is 





TABLE V 

Error in Predicting Reliability (Based wpon Observed Medians) 

Error Brown-Spearman Wherry 
over .210 1 1 

.151- —_.210 2 

.091-  ~=—-.150 3 2 
.031-— =.090 5 1 
—.029- .030 6 5 
— .089-— — .030 5 
— .149- — .090 1 
— .209- — .150 1 
below — .209 1 


TABLE VI 
Rietz Criteria of Normality Applied to Results from Medians 





Criterion Normal Brown-Spearman | Wherry 
Uy 0 .074 — .018 
Bi 0 .497 — .081 
Be | 3 1.599 | 2.284 





sometimes preferable to use the median rather than the mean as a basis of calcu- 
lation, since the median is less affected by extreme cases. The writer has there- 
fore recalculated the problem on the basis of the medians discovered by Rem- 
mers, and the results are given in Tables IV, V,and VI. The results were found 
to differ but little from those based upon the means of the distributions. 

If we now assume that the formula (3) has been empirically established and 
justified, we must still answer a very practical question, namely, “How long 
shall we make our tests in order to achieve the greatest reliability?” To answer 
this question we must find the point at which R becomes a maximum, with 
respect to changes in M, assuming 7, and N to be constant terms. To find this 
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point we must find the derivative of equation (3) with respect to M and set the 
numerator equal to zero, thus, if we write Formula (3) in a slightly more usable 
form, we have, 


2 (N — 1)M?r?, M-1 








= - = a ae are 3 
(N — M)(1 + 2[M — 1)r,, + [M — 1]*r7, N—M aad 
whence 
dk* (1+ (M — Ury) 4ri.M? — (2Nris + 8ryll — 7) M + GA —7)"} (4) 
dM (N — M)*(1 — 2M — 1)r,, + (M — 1fri,)* 


which causes & to reach a maximum or minimum when the numerator is placed 


TABLE VII 
Showing the value of M which will give a maximum value for R 
(According to the Brown-Spearman-Wherry-Smith formula) 





























Ti 

N : $$ $_____—_____—. —— 

10 | 20 | 30 | 40 | 50 | 60 | .7% | .80 | .90 
10 | Imag. | Imag. | 3 | 4 | 4 | 4 5 5 | 5 
20 | Imag. 6 | 8 | 9g 9 | 10 0 10 | 10 
30 | Imag {| 12 | 13 | 14 | 14 | 14 | 15 | 15 | 15 
40 11 17 | 18 | 19 | 19 | 19 | 20 | 20 | 20 
50 17 22 | 23 | 24 | 24 | 24 | 25 | 25 | 25 
60 22 27 | 28 | 29 | 29 | 29 | 30 | 30 | 30 
70 27 32 | 33 | 34 | 34 | 34 | 35 | 35 | 35 
80 32 | 37 | 38 | 39 | 39 | 39 | 40 | 40 | 40 
90 38 42 | 43 | 44 | 44 | 44 | 45 | 45 | 45 
100 43 47 | 48 | 49 | 49 | 49 | 50 | 50 | 50 














equal to zero. Thus, placing the numerator equal to zero and factoring this 
equation, we find its roots to be 


a —(1 —7,,) 
Ty 
= 2Nr,, — 3(1 — ry) — VAN? ri, — 12Nr,,(1 — tn) hi tr (5b) 


Sry 


M (5a) 





M 





or 








_ 2N ry ~ 3(1 ~ rn) + VANP rh ~ 12N7 1 — ry) = 7A = tw)? (Ge) 


M 8r, 


and by substituting actual values of N and ry in the equations, we find that 
equation (5c) is the root we are seeking (i.e.) the value of M for which FR be- 
comes & maximum. 
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It can also be readily seen that the value under the radical approximates a 
perfect square (lacking 16 units of being that figure) of the quantity outside of 
the radical, thus approximating this value for large values of VN. Thus, when V 
is large (exceeds 100) we may secure satisfactory approximations to M if we 
rewrite equation (5c) in the form below 


N | 3(1 — r,,) 


MM regrectnataly) = 9 Ar. 
ll 


(5d) 

Table VII shows the results of equation (5c) for values between N = 10 and 
N = 100 (by increments of 10) for values of ry; from .10 to .90 (by increments of 
10). The use of the formula does not yield integers, and so the results in the 
table are recorded to the nearest whole number rather than exactly as given by 
the formula. 

If, in order to test the validity of formula (5c), we apply it to the values in 
Tables I and IV, we find fairly close agreement. The formula in each case pre- 
dicts a maximum value for R when M lies between 15 and 20, and in the actually 
lengthened tests R is found to be a maximum when M is 30, 15, 15, 20, 30, 15, 20, 
and 20, thus being in agreement six times out of eight. 


Conclusions 


1. The Brown-Spearman formula appears to give results which contain both 
constant and chance errors. 

2. These results can be practically eliminated by applying the Wherry-Smith 
correction formula to the results obtained by the Brown-Spearman formula. 

3. We may find the value of M which will give the greatest value of R by 
substitution in equation (5c) above, and then by substitution of this value in 
equation (3), find the most probable value of # at its maximum point. 

4. For large values of N we may secure satisfactory approximations to M by 
means of the simpler formula (5d). 
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THE LIKELIHOOD TEST OF INDEPENDENCE IN CONTINGENCY 
TABLES 


By S. 8S. WILkKs 


J. Neyman and E. 8S. Pearson! have applied the principle of the ratio of likeli- 
hoods to the problem of determining criteria for testing various hypotheses about 
the group frequencies in problems dealing with grouped data. In particular, 
they have discussed the fundamental x? problem, the test of goodness of fit, the 
hypothesis that two samples of grouped data are from the same population, 
and the hypothesis of independence in contingency tables. In their treatment 
of these problems, these authors have started from the limiting form of the 
probability of an observed set of frequencies and have shown that approximately 
each of the appropriate \’s is a function of the minimum value of a corresponding 
xi. The distribution of this minimum value is found, from which the significance 
test is made. 

In certain cases the exact values of the \’s are relatively simple functions of 
the observations which can be as conveniently calculated as the correspond- 
ing x?’s. The purpose of this note is to consider the exact expressions for the \’s 
and find their asymptotic distributions in large samples for the following 
hypotheses: (1) that a sample of grouped data is from a population with 
specified group frequencies (i.e., the fundamental x? problem) ,(2) that several 
samples of grouped data are from the same population, and (3) that there is 
independence in a contingency table. 


1. The fundamental x? problem. Let p1, po, --- p; be the probabilities of the 
mutually exclusive events Ei, E2, --- EL; respectively. Ina sample of N events 
the probability that H,, Ho, --- EL, will occur 1, ne, --- nz times respectively, 
is given by 


N! 
1 C= ———— p'pa?: ++ py. 
(1) minal---mle? ?? Pr 


If we let & be the class of all sets of values of the p’s such that their sum is 
unity, there is only one set of p’s that maximize C, namely, p; = n;/N (j = 1, 2, 
---k). The maximum of C is 


N! ning? --- nek 
(2) C(Q max) = ——- : wpa = 0 
Ny: Ne! oo3 nm! N: 


1 Biometrika, vol. 20A (1928), pp. 263-294. 
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The likelihood of the hypothesis that the sample is from a population speci- 
fied by p’s having the values pi, po, --- px is defined as 


__ © _ (Np\" (Np. (Nps\" 
o gee a ae 


A; is a quantity which clearly lies between 0 and 1. It will be 1 only when 
p; = n;/N (j = 1, 2, --- k), (that is, when the hypothesis is rigorously sup- 
ported by the sample) and tends to 0 as the sample values n;/N diverge more 
and more from the hypothetical values p;. The problem of making an exact 
test of significance of an observed value of A, would involve the computation 
of all terms of form (1) the n’s of which make 4, less than the observed value of 
\s. This, of course, is impracticable except perhaps for the binomial case with 
small values of N. However, if the n’s are large we can find an approximate 


solution. If we let 2; = se then except for terms of order 1/+/N and 


higher, the z’s are distributed according to the law 


1 ~Ly Si 
(4) iin; eee 
V (29)k4 PiP2*** Pe . 


where 2;z; = 0. Neglecting terms of order 1/+/N and higher we easily find 
2 

(using natural logarithms) —2 log \, = >> a Therefore, if 6 = —2 log As, 
7 2p; 


6 is approximately distributed according to the function 
k—-1 k-3 


s ore" 4 
2 


(G) 


which is the x? distribution with k — 1 degrees of freedom. 

Since we have neglected terms of order 1/+/N in obtaining (4) there is no 
theoretical reason why x? should be used in preference to —2 log X, as the cri- 
terion for testing the hypothesis that the sample is from a population specified 
by 71, p2, --- Pe. Any practical advantage which —2 log \, may have will 
therefore justify its use. 


2. The hypothesis that several samples of grouped data are from a common 
population. Let pi, pi2, --- pis be the probabilities with which the mutually 
exclusive events Ey, Ei2, --- Eis occur, where 2; pi; = 1(¢ = 1,2, --- r). Then 
in a sample of N; events the chance that Kin, Lie, --- Eis will occur nj, riz, - + + Nis 
times respectively is given by an expression similar to (1). The chance of the 
joint occurrence of the r samples is 
(6) Ni! No! --- N 


! 
”? n N12 lrg 
y Pir" Pi" ea) ae 
se 


ny! N12! ea 
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We are interested in testing the hypothesis that the r samples are from the 
same population, that is, that the r sets of p’s pa, Piz, --+ pis @ = 1, 2, --- 7) 
are the same. The likelihood criterion \, appropriate to this hypothesis is the 
ratio of the maximum (w(max)) of (6) subject to the condition that the sets of 
p’s are the same (that is, pi; = p; say,7 = 1,2,--- 7; 7 = 1, 2, --- s) to the max- 
imum (2 (max)) of (6) without this restriction. 

For convenience let the observations be arranged in table form so that nj; is 
the frequency in the 7-th row and j-th column. Let n;. and n.; be the totals of 
the 7-th row and j-th column respectively, and N the total of all observations. 
Thus n;.isthesameas N;. The expression for X, will be 


n 


ah a Nog Mi.m™e. .., pe 
nnn", a ne? 





(7) Ne = 


Nyy wy nrg 
N™ Nyy" My9" + ++ Nye 


It can be shown analytically that \, lies between 0 and 1. It can be 1 only 
Nj Ng; Nr 
when — ===: =—, 
“"N, N2 N, 
common population is perfectly substantiated by the samples. Because of the 
fact that the n;; are integers, it is clear that \, can be 1 only in exceptional 
cases, but it can take on values arbitrarily near 1 for sufficiently large values of 
the Nj. 


j = 1,2, --- 8s, that is, when the hypothesis of a 


If the N; are large, the quantities z,; = a are approximately dis- 


tributed according to the function 


1 . fe oe 
a a 
(8) f= (se pi p2*** ~) . —. 


where 2;2;; = 0, i = 1,2, --- r. By neglecting terms of order 1/->/N and 
higher, we find that 


(9) —2 log dr. = 2 (Nas — VNi Qi VN wii)) 


N2 D; 


Denoting the quantity on the right side of (9) by x; it follows by straightforward 
analysis that the characteristic function ¢(¢) of x} defined by the r(s — 1)-tuple 


oo 


integral vee e"X0F dry --- dxrs has the value 


(r=1)(5—1) _(r=1)(8=1) 

(10) G) * G-—#) ' 

But it is well known that (10) is the characteristic function of any quantity dis- 

tributed according to (5) with (k — 1) replaced by (r — 1)(s — 1). This, of 
course, is the x? distribution with (r — 1)(s — 1) degrees of freedom. 

It will be noticed that the exact value of \, is a function of the observations 

n;; Which is independent of the p’s, while the approximate value of —2 log X. 
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as given by (9) involves the p’s. Before (9) could be used practically, one would 
have to replace the p’s by sample estimates, thus making further approximations 
necessary in order to get the distribution. If the usual estimates p; = n.;/N 
are used for the p’s in xj we find that x3 reduces to 


(» ' N;. oy 

rr 

(11) TPT Miccuiscoel tall: 
+ 2 ni. %.j 


N 


which is the familiar x? function for testing independence in contingency tables. 
However, (11) differs from x; by terms of the same order (i.e., 1/+/N,) as those 
by which x¢ differs from —2 log \.. Since we have neglected terms of the same 
order in obtaining (8), there is no theoretical reason why (11) should be used 
rather than —2 log i, for testing the hypothesis that the m samples are from a 
common population. 


3. The hypothesis of independence in contingency tables. We shall con- 
sider a sample of N observations which can be arranged in a two-way contin- 
gency table having r rows and s columns. Let p;; be the probability that an 
observation will fall in the 7-th row and j-th column. The probability that the 
sample of N items will be distributed so that n,; will be the number falling in 


the 7-th row and j-th column (i = 1, 2, --- r; 7 = 1, 2, --- 8) is given by 
N! dite ‘ 

12 cl litera ta iele cae nike amt ll Be ee rs : 

' . Mu! Ne! ++ n,, P11P12 Pre 


Here we are interested in testing the hypothesis that the classification by rows 
is independent of the classification by columns, that is, that p;; is of the form 
pig; Where 


(13) Tpi=1, 3 =1. 


For this hypothesis the appropriate likelihood criterion, say \/, , is the ratio of 
the maximum (w(max)) of (12) when pi; = pig; restricted by the conditions 
(13) to the maximum (2 (max)) of (12) subject only to the condition that 
> pz = 1. 2d turns out to be identical with \. in (7). When the hypothesis 


ted 

of independence is true, the approximate distribution of the quantity —2 log rv. 
is the same as that of —2 log \. when the hypothesis of a common population 
is true. To show that the distributions are the same we note that by placing 


( ny — Np: 
(14) tg = 
q 4/N 


we find from (12) that the x;; are approximately distributed according to the 
function 





194 Ss. S. WILKS 


(15) 


rs—l 8 
(27) * (pipe +++ Pr)” (qige *** Gs)” 


where z= xi; = 0. To the same degree of approximation we find 
1,7 


Pi; 

Now the characteristic function of xj? can be shown without much difficulty to 
be identical with that of xj as given by (10). The identity of the characteristic 
functions of x” and x3 implies the identity of the asymptotic distributions of 
—2logX) and —2log\.. The problem of testing the hypothesis of a common 
population in several samples of grouped data is mathematically equivalent to 
that of testing the hypothesis of independence in contingency tables. 

a qi = Wy are used in (16) we find that xj 
becomes the expression given by (11). But (11) differs from x,” by terms of 
order 1/+/N and higher. Therefore, — 2 log \/, and (11) can differ from each 
other only by terms of order 1/+/N which is the order of approximation involved 
in getting (15) from (12). Thus, — 2 log \/, has as much validity as the usual 
criterion (11) for testing for independence in contingency tables. 

The \, method can easily be extended to the case of contingency tables of 
higher order. For example, in a three-way table of r rows, s columns and ¢ 
layers in which n,;; is the number of items observed in the 7-th row, j-th column 
and k-th layer, the \/, criterion for testing the hypothesis of independence, that 
is, that the probabilities p,;, are of the form p1;p2;p3, is such that 


—2logd\, = 2 D> (nix log nix) + 4 .N log N — 2 »> (ni.. log ni..) 
$,3,& t 


—2 i (n.;. log n.;.) — 2 de (n..% log n.. x) 


where n;.. = Zz Nix, and soon. —2 log X’ in this case is approximately dis- 
jek 


(16) ~tegn, = Dy HB Pete — Zee? is 
t.7 


If the usual estimates p; = 


(17) 


tributed like x? with rst — r — s — t + 2 degrees of freedom. 

4. Illustrative examples. ‘To illustrate the use of \, we shall consider the 
following example given by R. A. Fisher*® dealing with de Winton and Bateson’s 
data on results of interbreeding the hybrid (F1) generation of Primula in which 
two factors are considered. 


Flat Leaves | Crimped Leaves 


Primrose Primrose | Total 








| 
| 
Observed (n,) | 77 | 
Expected (Np;) 5 zi 





2 Statistical Methods for Research Workers, 4th ed. p. 84. 
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If the two factors are Mendelian, that is, segregate independently, the four 
classes of offspring resulting from interbreeding the F generation are expected 
to appear in the ratio 9:3:3:1 (assuming all classes equally viable). We wish to 
test the hypothesis of a 9:3:3:1 ratio. It is found that 


— 2 log. A. = 2 log, 10 [ > nj logy ni — DY ni logio (Nps) | = 11.50. 


Entering Fisher’s x? table for n = 3, we find that the chance of exceeding the 
value 11.50 is less than .01, which is significant if we take P = .05 as the critical 
level of significant deviation. Thus, the observed frequencies cannot be reason- 
ably explained as chance deviations from the 9:3:3:1 ratio. 

The usual x? method gives x? = 10.87 and n = 3 for the 9:3:3:1 hypothesis. 
The value of P in this case lies between .01 and .02. It follows from the theo- 
retical discussion that 10.87 has no greater validity than 11.50 in testing this 
hypothesis. 

We shall illustrate the use of \. by using another example given by Fisher 
dealing with Wachter’s data for back-crosses in mice. 











Black Black Brow Brow 
Self | Piebald | Self. | Piebala | Total 
Coupling: 
ESSERE ee 88 82 75 60 305 
Se eae 38 34 30 21 123 
Repulsion: | 
Serta de vdaeaewasn | 115 | 98 80 130 418 
Ee eegrerere 96 88 95 79 358 
Ee ee 337 297 280 290 1204 





| | 


The back-crosses were made according as the male or female parents of the 
F, generation were_heterozygous in the two factors Black-Brown, Self-Piebald, 
and according to whether the two dominant genes came both from one parent 
(Coupling) or one from each parent (Repulsion). We wish to test the hypoth- 
esis that the proportions are independent of the matings used. We find 


—2 log A. = 2 log, 10 [> Ni; logio Ni; 
iyi 
a N logio N — > N;. logio nu. — >i N.; logio ni | = 21.69. 


Entering Fisher’s x? table for n = 9 we find that the chance of exceeding this 
value is less than .01. The departure from the hypothesis of independence is 
significant on basis of the P = .05 level. The x? method gives the remarkably 
close result x? = 21.83, which, with n = 9 gives P < .01. 


5. Summary. We have considered the exact expressions for the Neyman- 
Pearson \ criteria appropriate to the following hypotheses: (1) That a sample 
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of grouped data is from a population with specified group proportions (the 
fundamental x? problem), (2) that several samples of grouped data are from a 
common population, (3) that there is independence in a contingency table. The 
quantity —2 log \ for each of these cases is approximately distributed like x?, 
the number of degrees of freedom being given in each case. It is shown that the 
usual x? method of testing these hypotheses has no greater theoretical validity 
than the \ method. On the practical side, it is to be remarked that —2 log \ 
can be computed with fewer operations than x2. Two examples are given to 
illustrate the practical application of the \ method. 


PRINCETON UNIVERSITY. 








THE PROBABILITY THAT THE MEAN OF A SECOND SAMPLE WILL 
DIFFER FROM THE MEAN OF A FIRST SAMPLE BY LESS THAN 
A CERTAIN MULTIPLE OF THE STANDARD DEVIATION OF 
THE FIRST SAMPLE 


By G. A. Baker, Pu.D. 


The following statement of the significance of a probable error is often made: 
“The probable error of the mean is a value above and below the mean such that 
if the test were repeated under the same conditions there would be, on the 
average, equal chances that the mean would fall within or without this range.” 
The probable error is attached to the mean of the sample and it is assumed that 
the standard deviation of the sample is that of the sampled normal population. 
This was formerly a very usual explanation of the meaning of probable error by 
research workers, but it is inaccurate and misleading, especially for samples of 
20 or less such as are dealt with in agricultural experiments. The inaccuracy of 
this explanation of the meaning of probable error has been realized for many 
years by competent statisticians, but no satisfactory treatment has heretofore 
been devised.!. 

The attempted explanation of the probable error in terms of the expected 
frequency of the occurrence of different size deviations of the means of future 
samples from tae sample mean does raise a very interesting, important, and 
legitimate question, namely, what is the probability of a second mean lying within 
a certain multiple of the standard deviation of a first sample of the mean of a 
first sample? This question is of fundamental concern to those engaged in 
experimental work. Its answer will indicate to investigators reasonable devia- 
tions from the results of their first experiments, will form a valid basis for the 
rejection of doubtful observations or groups of such observations, and will form 
a basis for a test of the significance of the divergence of results in different 
experiments. It is found that the usual method of treating the probable error 
gives an overly optimistic idea of the smallness of the deviations that may be 
expected in future samples. 

The distribution function of the variable 


o = 





y 


where x is the mean of the first sample, z is the mean of the second sample, 
and y is the standard deviation of the first sample, is obtained in this paper. 
The sampled population is assumed to be normal. 


1Camp, Burton H. ‘Suggested Problems for Mathematical Research,’’ Journal Amer- 
ican Statistical Association, Supplement Vol. 30, No. 189A, Mar. 1935, p. 259, No. 5. 
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Let the sampled population be represented by 


(1) f(z) = 


ne 
— ei, 
/ 2r 
If a sample of n is drawn from (1) the means, as is well known, will be distributed 
as proportional to 


(2) oor ’ = 





| 
8 
IA 
8 
IIA 
8 


IIA 
8 
IIA 
8 


and the standard deviation will be distributed as proportional to 
(3) yr te, 0 


If a second sample of n is drawn from (1) its mean will be distributed as propor- 
tional to 


(4) om, = 


IA 


y S ~, 


IA 
N 
IIA 
8 


Consider the expression 





(5) ta 
Y 
and call it v. Then v is the difference between the means of the two samples 
measured in terms of the standard deviation of the first sample. The distribu- 
tion function of v is sought. 
The three variables x, y, and z are independent. Let y, for the moment, have 
a constant value and write 


(6) vy = x — Zz. 


The probability of a given value of vy in d(vy) for a given value of y is now being 
sought, that is, vy is regarded as constant. This probability is proportional to 


(7) | yr terior tne / e7 (e+hev)? ie | d(vy) ; 


oO 


from the application of the following 

Lemma. Ifxand yareindependent variables, -— © SxS ~,— xo Sy ~, 
and the probability of an x in dz is f(x)dx and the probability of a y in dy is 
¢(y)dy, then the probability of v = y — x in dv is proportional to? 


| [seo +2 az | dv. | 
| 


Thus the probability of a value of v in dv for a given y is proportional to 


(8) yr te Lite] gy 


2 Baker, G. A. “‘Random Sampling from Non-Homogeneous Populations," Metron, 
Vol. 8, No. 3, Feb. 1930, p. 68 (slightly modified). 
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since d(vy) = ydv for y constant. Hence the total probability of a particular 
value of v in dv will be given as proportional to 


(9) | yout n(t¥) r ay | dv 
0 


which is proportional to 


dv 


(9) 


If the number in the first sample is n; and the number in the second sample is 
ne, then (10) becomes 


(10) 


w/e! 


dv 


(11) gonial nay 
242 
(1+ #) 


This distribution, (11), permits an answer to be given to the question, what is 
the probability that the mean of a sample of a given size v2 will differ from the 
mean of a first sample of size n; by as much as a constant multiple of the standard 
deviation of the first sample? Thus, this distribution gives a clear and compre- 
hensible indication of the expected conformity of future experiments and gives 
a valuable test for the significance of the difference between two means. [If it is 
desired to use this distribution as a rejection criterion, n; should be taken so as 
to include as many items as possible and so as to exclude the doubtful ones. 
The doubtful items should be included in the second sample. If the original 
sample is broken up into two or more samples it must be done in such a way as 
not to destroy the randomness of the resulting parts. 

Example. Suppose for the purpose of illustration that a sample of four is to 
be considered. The proper v-distribution is 


/2 dv 


(1 + “y 


The value of v which is necessary to give a probability of one-half is a root of 


po, lVv2p _t 

V2 2P+2° 4 | 
which is .9. That is, an interval of 1.8 times the standard deviation of the 

sample of four with center at the mean of the sample is necessary for a proba- 


bility of one-half that the mean of the next sample of four will lie in this interval. 
This compares with .75 times the standard deviation of the sample if 





tan! 


o 





V/n—1 





200 G. A. BAKER 


is used as the probable error of the mean and with 65 times the standard devia- 
tion of the sample if 


ae 

Vn 
is used as the probable error of the mean. The last two methods of calculating a 
probable error with the interpretation indicated at the beginning of this paper 
give the investigator an unwarranted feeling of assurance about the agreement 
of future samples with a first sample. 

If two samples of mn; and nz are drawn from the normal population, (1), then 
these samples can be combined for the purpose of calculating a standard devia- 
tion and the difference between the means of the samples can be measured in 
terms of the standard deviation of the combined sample. The distribution 
function of the difference of the means divided by the standard deviation of the 
combined sample is 


(11) dv 


Ny+ Me i 
nmin ‘ 
E + Gag | 2 
(mi + Ne) 


This distribution, (11’), is the basis for a valid test for the significance of the 
difference between two means. If either this test or the test based on distribu- 
tion (11) shows a significant difference between the means it can not be ignored. 

“Student’s” ¢-distribution is proportional to 


dt 


14% 
(1+ ye £5) 


The above distributions can be easily transformed into ¢-distributions so that 
“Student’s” tables can be used. For instance, if we put 


“ea sail 


then (10) becomes proportional to (12). Again, put 


(12) 


i. 9" 


and (11) becomes proportional to (12). Finally, put 


(nm + me) t 
mere orn Neo Jn + % - a “i 


and (11’) becomes proportional to (12). 





Summary. The distributions found for the difference of the means of two 
samples in terms of a standard deviation of one sample or combination of both 
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samples are similar to and easily transformed into “Student’s” ¢-distribution so 
that his tables can be used. However, these distributions answer a practical, 
interesting, and important question that “Student’s” ¢-distribution does not. 
If in an experimental science a series of observations is made it is desirable to 
know how much a similar series of observations could be expected to differ from 
the set of observations now available. This deviation, if it is to mean anything, 
must be expressed in terms of quantities available from the observations already 
made. This paper gives the probability function of a deviation in the mean of 
a future sample measured from the mean of a first sample and measured in terms 
of the standard deviation of a first sample, that is, in terms of quantities known 
from the first sample. It is a very definite advantage and a great gain in assur- 
ance to know the point from which measurements are being made and the unit 
in which they are expressed instead of making vague, ill-defined assumptions 
about the zero point and unit length of the measuring scale. It is true that 
differences that were formerly considered significant may not be so considered 
now. But these differences would appear insignificant if experiments were 
sufficiently repeated, so that the net result is fewer inconsistencies to explain 


away. 
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is used as the probable error of the mean and with .65 times the standard devia- 
tion of the sample if 


alien 

Vn 
is used as the probable error of the mean. The last two methods of calculating a 
probable error with the interpretation indicated at the beginning of this paper 
give the investigator an unwarranted feeling of assurance about the agreement 
of future samples with a first sample. 

If two samples of n; and nz are drawn from the normal population, (1), then 
these samples can be combined for the purpose of calculating a standard devia- 
tion and the difference between the means of the samples can be measured in 
terms of the standard deviation of the combined sample. The distribution 
function of the difference of the means divided by the standard deviation of the 
combined sample is 


(11’) dv 


m1+h2 
Ni Ne oa 
1 2\" 2 
| + qt | 


This distribution, (11’), is the basis for a valid test for the significance of the 
difference between two means. If either this test or the test based on distribu- 
tion (11) shows a significant difference between the means it can not be ignored. 

“Student’s” ¢-distribution is proportional to 


dt 


O+yai 


The above distributions can be easily transformed into ¢-distributions so that 
“Student’s” tables can be used. For instance, if we put 


sa V2 t N=n, 
Vn —1 
then (10) becomes proportional to (12). Again, put 
p= Mat net 
VneVn — 1° 


and (11) becomes proportional to (12). Finally, put 


(12) 





N= Ni, 


(ny + no) t i 
v= a Se N =n = ne 
V/ ni Ne Vn +m — 1” 


and (11’) becomes proportional to (12). 


Summary. The distributions found for the difference of the means of two 
samples in terms of a standard deviation of one sample or combination of both 





——— 
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samples are similar to and easily transformed into ‘‘Student’s” ¢-distribution so 
that his tables can be used. However, these distributions answer a practical, 
interesting, and important question that ‘“Student’s” ¢-distribution does not. 
If in an experimental science a series of observations is made it is desirable to 
know how much a similar series of observations could be expected to differ from 
the set of observations now available. This deviation, if it is to mean anything, 
must be expressed in terms of quantities available from the observations already 
made. This paper gives the probability function of a deviation in the mean of 
a future sample measured from the mean of a first sample and measured in terms 
of the standard deviation of a first sample, that is, in terms of quantities known 
from the first sample. It is a very definite advantage and a great gain in assur- 
ance to know the point from which measurements are being made and the unit 
in which they are expressed instead of making vague, ill-defined assumptions 
about the zero point and unit length of the measuring scale. It is true that 
differences that were formerly considered significant may not be so considered 
now. But these differences would appear insignificant if experiments were 
sufficiently repeated, so that the net result is fewer inconsistencies to explain 
away. 


ON SAMPLES FROM A MULTIVARIATE NORMAL POPULATION’ 
By Sotomon KULLBACK 


1. Introduction. In this paper we shall discuss the distribution of certain 
functions calculated for samples drawn from a multivariate normal population. 
The method of solution is based on the theory of characteristic functions and 
presents further application of that theory to the distribution problem of 
statistics.” 

We shall have occasion to refer to the multivariate normal population whose 
distribution law is given by 


(1.1) F(x) = 1"? | Byg|!2e Rome — (p,q = 1,2, --- , n) 


where B(x — m, x — my) is the real, positive definite quadratic form of the 
2» — M, With matrix || B,,||.. Here m, is the mean in the population of the pth 
variate and B,, = A»,/2c0,0,A where a, is the standard deviation in the popu- 
lation of the pth variate; A is the determinant of population correlations ppg = pgp} 
Ang is the co-factor of pin A; and | B,,| is the determinant of the matrix || By¢ ||. 

Since the integral of (1.1) over the entire field of variation of the variables is 
unity, we have (using abbreviated notation) 


(1.2) | e—B(z—m, xz—m) dx — rl? Ba |—1/2 


Equation (1.2) will be true if || B,,|| is complex, provided its real part is sym- 
metric and positive definite.’ 

The distribution of sample means of samples from the population (1.1) is 
independent of the distribution of the system of sample variances and covariances 
and is given by* 


(1.3) F(z) = 2-2 | Apg |"? e742. Fm) 
where A(% — m, & — m) is the real, positive definite quadratic form of the t, — m, 


N 
with matrix || A,,||. Here z, = (1/N) >> 2, is the sample mean of the pth 
a=] 

1 Presented to the American Mathematical Society, February 23, 1935. 

2 For more complete reference to the theory of characteristic functions as applied to 
statistics see S. Kullback, Annals of Mathematical Statistics, Vol. 5 (1934), pp. 263-307. 

3 J. Wishart and M.S. Bartlett, Proc. Cambridge Phil. Soc., Vol. 29 (1933), pp. 260 ff. 

4 J. Wishart, Biometrika, Vol. 20 A (1928), pp. 32-52. 

J. Wishart and M.S. Bartlett, loc. cit. 
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variate, and A,, = NB,,, where B,, has been defined for equation (1.1). The 
distribution law of the system of sample variances and covariances is given by® 


| Aw |(w-D/2 


(1.4) F(a) = n 
qa (n—1)/4 Il l(N _ r)/2 
r=} 





e-A(@) | Gee | (N—n—2)/2 


n N 
where A(a) = > ApgQpq and Apg = Agr = (1/N) DS (Xpa — Ep) (Xqa — Eq) 
»,qg=1 a=1 


Ps 
with A,, and £, defined as for (1.3). Since the integral of (1.4) over the entire 
field of variation of the a,, is unity, we have® 


(1.5) | e-A(@) | apg |X" 2 da = r(n-Di4 | Ang |(1—N) /2 II r(N — r)/2 

r= 
Equation (1.5) will also hold if the matrix || A,, || is complex, provided its real 
part is symmetric and positive definite.’ 


2. Variance. Consider a sample of N independent items from the normal 
population (1.1). Let 


(2.1) v= >» Ana 


p,q=l 


where a», is defined as in (1.4). From the theory of characteristic functions 
and (1.5), we have that the characteristic function of the distribution law of v 
is given by® 


(2.2) g(t) = [oom F.(a) da = | Ap,|%?".| Ap, — t€ (9°. 
It may be readily shown that 


(2.3) |Ang — it| = |Apg| — it D> A™ 
p,q=l 
where A?? is the co-factor of A,, in | Apg|. 
We thus have for the distribution law® of v 


oO 


(2.4) P(v) = (Ajo. 2 | et? (A/c — it) A-%)/2 dt 
27 |_x 


5 J. Wishart, loc. cit. 

6 Cf.S.S. Wilks, Biometrika, Vol. 24 (1932), pp. 471-494. 

7 A. E. Ingham, Proc. Cambridge Phil. Soc., Vol. 29 (1933), p. 271 ff. The considerations 
in this paper will still hold if the condition above is imposed. 

8 §. Kullback, loc. cit., p. 272. 
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n 


where A =|A,,|,¢= >, A™and A/c > 0 since || Apq|| is positive definite. 
pP,q=1 


By using the fact that? 


‘ it 1(}- 
1 a+ico ana Pe 7 [u 1/T (k) ’ Mh > 0 


(2.5) sao } 
271 a—ico \0 ’ Lb Ss 0 


where k > 0, a > 0, we have 


(A /c) “vr 


(2.6) P(v) = T (N— 1)/2 


y(N—3) /2 e —(A/e)v 


3. Ratio of variances. If v; and ve represent the statistic v (defined in 
(2.1)), obtained from independent samples of N; and N2 items respectively, then 
it may be shown that the distribution law of w = »;/v2 is given by” 





(Ni + Ne —_ 2) 2 . wrr-9/2 (4 ‘ w)2-Mr-Nd 2 | 


G1) Pw) = Fy, DATO — 72 


If we set w = e? mi/n2, where ny = Ni; — 1 and ne = Ne — 1 we obtain for the 
distribution law of 2" 


U'(m + m)/2 _,, 
(3.2) P(z) = 2 Tn,/2 [Tn2/2 nN; 





/2 z 22\— ) {2 
2 52/2 em? (n, + nie%) tn? 


4. Student’s distribution. Consider a sample of N independent items from 
the normal population (1.1). Let 


n 


(4.1) = Z (Fp — Mp) (Fq — mM, 

p,q=l 
where Z, and m, are defined as in (1.3). The characteristic function of the 
simultaneous distribution function of yu, defined as in (4.1) and v defined as in 
(2.1) is given by 


o(t, te) = [ox {it = (Z aes Mp) (%, a M,) + ite > an} 
(4.2) 


p.q=l p.q=l 


F\(%)F.(a) déda 


9 Cf. A. E. Ingham, loc. cit. 

J. Wishart and M. S. Bartlett, Proc. Cambridge Phil. Soc., Vol. 28 (1932), p. 455 ff. 

10 §. Kullback, note accepted for publication soon in the Annals of Math. Statistics. 

1 Cf. R. A. Fisher, I. Proc. International Math. Congress, Toronto (1924), Vol. 2, pp. 805- 
813. 

R. A. Fisher, II. Statistical Methods for Research Workers, 4th Edition (1932), Edinburgh: 
Oliver and Boyd, pp. 224-227. 
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where F; and F2 are defined as in (1.3) and (1.4) respectively. From (1.2) and 
(1.5) we have that 


(4.3) g(t, te) = (A/c)¥/2(A/e — ith)-/2(A/e — ite)O-)2 


where A and ¢ are defined as in (2.4). The simultaneous distribution of u and 
v is given by 


(4.4) P(u,v) = a/aey | I e 4a O(t1, to) dtidte 
which evaluated by a procedure similar to that used for (2.4) yields 


(A /c)¥!2 1/2 em le y(N—3) /2 e7vAle | 


(4.5) P(u,v) = TW — 121172 Be 


From (4.5) we may readily obtain the distribution of z = y'/?/v'/? to be” 


(4.6) De) =2 yet, ses). 


5. k samples. Suppose we have k independent samples of Ni, No, --- , Nx 
items respectively, drawn fron the normal population defined by (1.1). Let 
ur, (r = 1, 2, --- , k) be the statistic u, defined by (4.1), for each of the k sam- 
ples respectively; let V,, (r = 1, 2, --- , k) be the statistic V, defined by (2.1), 
for each of the k samples respectively; let wo and Vo be the values of these sta- 
tistics for the sample of N = Ni + No + --- + N; items obtained by pooling 
all the samples. 

It may be readily verified that 


k k 
(5.1) wo = Dy wNe/N?4+2 Dy wal*us’*NaNs/N? (a ¥ B) 
r=1 a, B=1 
k 
(5.2) Nuo + NVo = Xu (Nur + N,V;) 
k k 
r=1 a, B=1 


where M, = (NN, — N?)/N. 


In view of (2.6) and (4.5), it is evident that the simultaneous distribution 


law of u-, Vr, (r = 1, 2, --- , k) is given by 
k 
(5.4) P(u) - Q(v) = I} P(ur; N,) Q(V,; N;) 





12 Cf. ‘“Student,’’ Biometrika, Vol. 6 (1908-09), pp. 1-25. 
R. A. Fisher, Metron, Vol. 5 (1925), pp. 90-104. 
P. R. Rider, Annals of Mathematics, 2nd S., Vol. 31 (1930), pp. 579-582. 
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where 


il? 


(5.5) P(ur; Nr) = T1/2 


(B/D)? — e7Nrur B/D 


NW} )/2 


(56) QV; N.) = EB 


(Ny—1)/2 YWNr-3)/2 2—N-Ve B/D 
(B/D) Vy e 


n 


and B is the determinant | B,, | defined in (1.1) and D = >> B»* where B’tis 
p,q=l 
the co-factor of B,, in | By, |. 
Using (5.3) and (5.4), we find that the characteristic function of the simul- 
taneous distribution law of g¢, = V, B/D, (r = 0,1, --- , k) is given by 
(5.7) oto, th, es ti.) = [eorrems see 9 ty) P(u) 7 Q(v)dudv 


where 


k i: 
U (to) = (B ite/D) { Y wM./N-2 Y bh!*uh/*NaNs/ N°} , (a #8) 


V(to, hy, cna ae ti.) = (B/D) iz V (it, + Uo n./Ny} 


Let »-B/D = ¢? and V,B/D = n,, (r = 1, 2, ---, k) and rewrite (5.7) as 
the product of k + 1 integrals 


(5.8) o(te, th, --- , te) = Ioly --- Ti 


where 


— (NN2 sah N,.)!” a 
(5.9) Io Pd | dj 


with 
k i k 
T(k, 6) = > ¢2(N, — ito M-/N) + 2it) D> SatsNaNe/N?, (a ¥ B) 
r=1 a,B=1 
and 


NOXr-10/2 x” ‘ 
5. = I oad ° N,; — s N a : (Nr—-3)/2 - 
(6.10) Tem Ag [exp (= ae (M, — ito, /N — it} a2" 
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By employing (1.2) we find that 


Ni — itpyM,/N at N\N2/N? cos at NiN;,/N? —1/2 


‘ Ato N2N,/N? Ne — ato M./N coe Ato N.N;/N? 
(5.11) Ip = (NiN2 -- + Ny)"? . 


| ato N,.N,/N? ato N:N2/N? eee Ni: _ ito M:../N 


The determinant may be readily evaluated by removing the common factor N, 
from the rth row (remembering the value of M, as given in (5.3)) and applying 


the operations (row 1 — row 2), (row 2 — row 3), --- , and then column k + 
column 1 + column 2 + --- + column k — 1. We thus obtain 
(5.12) Ip = (1 — it)/N)-4-?2 


The integral in (5.10) is well-known and yields 
(5.13) I, = N12 (N, — ity N-/N — it,)-r-vr, 
There thus results 


. 
(5.14) g(t, try «++ » te) = GQ — tto/N)-@-? TT (Na — ttoNa/N — ity)—%e-Y 2 


a=1 


k 
whereG = [[ N&e?”?, 


a=1 


The simultaneous distribution law of ¢,, (r = 0,1, --- , k) is given by 


G 


P(¢o, ry *** > Or) = (Qn) 


(5.15) ” e7*" Pott, O— +++ — ith ok dty dt, eee dt; 





k 
| (1 = ite/N)*P2 TY (Na — tty No/N — ita) "DP? 


a=1 


Integrating successively with respect to t,, tx, --- ,¢: and applying (2.5) we have 


= l- (Wa-3)/2 1 
P(¢0, $1) +++ » Ge) = G exp a? Nave} I a oR = 


a=1 


(5.16) 7 m - 
ines xr Xt y,) dty 
(1 ble ity/N)* 





—o 


13 Cf, A. C. Aitken, Quarterly Journal Math., Vol. 2 (1931), pp. 130-135. 
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and finally 
P(g, Pi s**, $x) _ GN&-Y?2 e-N eo 
(5.17) (vo — Nigi/N — --- — Nug,/N)*” k giNa-8/2 
r(k — 1)/2 i4 TW. — 1/2)" 
If we apply to (5.17) the transformation 
lo =¢ 
(5.18) | 7 
gr = Nb+g0/Ny (r = 1,2, --- , k) 


and integrate out g, we obtain for the simultaneous distribution law of ¢, = 
N,¢-/N¢o = N,V;/NVo 


T'(N — 1)/2 a 
D(S1, Fy «++ Se) = Wee Q—-hui-—f—--- —5,)*?" 
(E.19) 
¢Na-B/2 
I] T'(Na — 1)/2 
where the limits of variation in (5.19) are" 
0<hus1 

(5.20) 

<6 51-hi-&-—---—Gu, (r = 2,3, ---,k) 

6. Correlation ratio. Let ¢ = log (1 — fi — f2 — --- — &%) where the 


tr, (r = 1, 2, --- , k) are defined and distributed as in (5.19). The character- 
istic function of the distribution law of ¢ is given by 


r(N — 1)/2 k+2it—3)/2 
g(t) = i(k = pw OF fa 1 — Sa — oe oy) OP 


(6.1) k 
Il gite-Gs dt 
41 T(N. — 1)/2 ” 


where the limits of variation are given by (5.20). The integral in (6.1) is readily 
evaluated as a Dirichlet integral, and we obtain 


_ T(N—1)/2 Tk — 1 + 2it)/2 
ed el = re 1a TW 1 4 2M)/2" 


14 Cf. J. Neyman and E. 8. Pearson, I. Bulletin de l’ Académie Polonaise des Sciences et 
des Lettres, Série A, Sciences Mathématiques, 1931, pp. 460-481. 

15 EF. Goursat-E. R. Hedrick, Mathematical Analysis, Vol. I (1904) (Ginn and Co., N. Y.), 
p. 308. 
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The distribution law of ¢ is given by 


5 ne) — PN -1)/2 1 [” ig TR 14+ 2it)/2 
(6.3) PO) = FE 1) 8 De [ ow TN = 1 4 Dit) /2 


Now it may be shown that"® 





64) 2 * at PK— 1+ 2i)/2 4, _ tPA — ety reon 
2x Jw” TIN—1+28)/2 T(N — k)/2 

so that 

(6.5) P(¢) T(N-1)/2 ef -D/2(] — ef) (N—-k-2)/2 | 


~ T(k— 1/2 T(N — k)/2 
If we set ef = 7, then we obtain for the distribution!’ of 7? 


NW — 15/3 
(& — 1)/2T(N — k)/2 





(6.6) D(n’) = 5 Uae, 


From its definition we have that 


(6.7) n” = (NVo — NiVi — --- — NiVi.)/NVo 


which reduces to 


(6.8) n = (NiW, + NoWe + --- + NiWx)/NVo 


where Wa = >, (pa — Ey)(Fqa — Fo) with pa the sample mean of the pth 
pP,q=1 


variate in the ath sample and #, the sample mean of the pth variate in the 
sample formed by pooling all the samples.* 

In a similar manner, we have that the distribution law of 72 = ¢,, 
(a = 1,2, --- , k) is given by 


/ 2) r'(N — 1/2) 2\(Ng—3) /2 2\ (N—Ng—2)/2 
(6.9) D(na) = T(N. — 1)/2T(N — N./2) (nz) ed Na)‘ a 


It may be of interest to point out another derivation for the distribution of 


h? = 1— 7. Let 


6.10) (9 = (B/D)(NiV; + NoV2 + «++ + NVi) 
\@ = (B/D)NV. 


16 Whittaker and Watson, Modern Analysis, 2nd Ed., pp. 283, 333. 

17 Cf. R. A. Fisher, loc. cét., I. 

H. Hotelling, Proc. National Academy of Sciences, Vol. XI (1925), pp. 657-662. 
18 Cf. S. 8. Wilks, loc. cit., p. 482. 
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The characteristic function of the simultaneous distribution law of @ and 4p is 
immediately derivable from (5.14) by replacing t by Nt and t, by N,t 
(r= 1,2,---,k). There results 


(6.11) elt, te) = (1 — th) EPA — tty — a), 


By a procedure similar to that already used we find that the simultaneous distri- 
bution law of @ and @ is given by 


giX—k-2)/2 (A eit g) O26 % 


> 2 — e 
(6.12) P(, 60) = r(W — &)/2 Tie — 1)/2 © 


By applying to (6.12) the transformation 6 = 4h’, 0) = 6 and integrating out 
the value of %, we find for the distribution law of h? 


T(N — 1)/2 


(6.18)  D(h) = T(N — k)/2T(k — 1)/2 


(RYO FPR =e rere 3 


From (6.12) and (6.10) it may be shown that the following estimates of variance 
all have the same expected value” 


( NiVs + NaVa + os + MV 





N—k 

d . N Vo . 

(6.14) vi 
|NiWi + NoWe + «++ + NiWe 

i 3 


7. Distribution of variances. Let 


I 


(0, = N,V,B/D (r 
(7.1) 6 = NV) B/D 
@ = (B/D) (Ni; + N2V2 + --- + NeVi) 


1% +++ 8 


where the right members of (7.1) are defined as in section 5. It is evident that 
the characteristic function of the simultaneous distribution law of 0, %, @,, 
(r = 1, 2,---, k — 1) is derivable from (5.14) by replacing t by Nt, t- by 
N,(te + t), (” = 1,2,---,k — 1) andt, by Ni. Thus 


g(t, to, ti, «+> , tes) —_ dd gz: ity) -~&-Y2 


(7.2) bs 
(1 — ity — it)—(Nk-D/2 II (1 — ity ant ite —_ at)“ -Na)/2 by 


a=1 


19 Cf. J. Neyman and E. 8. Pearson, II. Biometrika, Vol. 20A (1928), pp. 273-274. 
S. Kullback, Annals of Mathematical Statistics, Vol. 6 (1935), pp. 76-77. 
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By proceeding as in section 5 we arrive at the result that the simultaneous 
distribution law of 6, %, 6-, (r = 1,2, --- ,k — 1) is given by 


9 (A) — 6) 4-3) 12 (0 tie OO een Sig ane 00s 6.1) NA-3) 2 
P(6, 0, 0,) = ‘ mea - 
(8, Oo, 6.) r(k — 1)/2T(N; — 1)/2 
(7.3) k-1 (%a-3)/2 
iw.=y2 


where 0) = 0,0 = 0 + 02 + --- + Ox-1. 


By integrating out the variable 6) from (7.3) we have for the simultaneous 
distribution law of 6, 6,, (r = 1,2,---,k — 1) 


A “a | Cs Tes = )(vk-3) /2 k—1 @%a—3)/2 
4 D(8, 6, nan a eI sit a | 
(7 ) ( ) l(N;. man 1)/2 4 T(N. te 1)/2 


A procedure similar to that used to derive (5.19) yields for the simultaneous 
distribution law of 


(7.5) v, = 0,/6 (ry = 1,2,..-,&8—1) 
r(N — k)/2 ve~3) (2 
P(y1, Yo, oe Wir) = TN; — 1)/2 (} Y¥i—Yo— eee — Wr)“ k—8) /2 
(7.6) k-1 yNa-3)/2 
1T(N. — 1)/2 


where the limits of variation in (7.6) are” 


O<sws1 
(7.7) | 
OSy Sl-wp—-w-—----—wo, (ry = 2,---,k-—1). 
In a manner similar to the derivation of (6.6) we find the distribution law of 
hE = Vo (a = 1,2, +++, k — 1), hE = 1 — i — va — «+ — ver tobe 
2 r(N — k)/2 
eat: sina Sel spesnneaies 
(7.8) ened r(N. — 1)/2T(N —k—N. + 1)/2 


(h2)(Na-s) 2 (1 ae h2)(N-k-Ne-1)/2 : (a = 4d «+, k) . 


From the distribution law in (7.3) we readily obtain that the characteristic 
function of the distribution law of y; = log (@./(@) — 4) is given by 


7 ot) = Na — 1 + 21t)/2 Th — 1 — 2it)/2 ie — 
(79) ol) = ae oa (a = 1,2,---,k). 





20 Cf. J. Neyman and E. 8S. Pearson, loc. cit., 1. 
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We thus have that the distribution law of y2 is given by 













1 1 


2 ici <<incctaabiiiggip ities cis cake i. a 
P(ya) = (Na — 1)/2T(k — 1)/2 2e 


(7.10) 0 ‘ 2 
. oe T(Nq — 1 + 2it)/2 T(k — 1 — 2it)/2 dt. 


The integral in (7.10) is known,” and there results 


a ree 
(7.11) P(ya) = i. « 1)/2 r(k — 1)/2 ‘ ead 


2 
If we set e * = 6./(0) — 0) = 2 we have for the distribution of 2 









ay I'(Na + k — 2)/2 7 2) (Na—3) /2 2\—(Natk—2)/2 
oe 0-2 on ae 


An extension of the procedure used to obtain (7.9) yields as the characteristic 
function of the simultaneous distribution of yj, v3, --- 5 Yi 






_ T(k — 1 — 2it; — Qite— +--+ — 2it,)/2 
ie ré& — 1)/2 










g(t, te, 2 = t,.) 





(7.13) k 
M'(Na — 1 + 2ita)/2 
(Na — 1)/2 











a=1 


Successive application of the method used to evaluate (7.10) yields as the simul- 
taneous distribution law of the y2 

















T'(N — 1)/2 


T(k — 1)/2 4844 ...064 6477" 


Fe gYx(Na-D/2 
I] T'(Na — 1)/2° 


a=1 


P(yi, ¥3, i > Vi) = 
(7.14) 






The simultaneous distribution of the \2 defined as in (7.12) is given by 
2,2 + T(N —1)/2 2 2 2\—(N—1) /2 
D(dj, do, aii » AK) = T(k a 1)/2 (1 + Ai +r + oe + Az) 
(7.15) (y2) a2 
Lipa. 7 


21 Whittaker and Watson, loc. cit., pp. 283, 383. 
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8. Conclusion. In this paper we have presented further instances of the 
applicability of the theory of characteristic functions to the distribution problem 
of statistics. In a subsequent paper the author hopes to illustrate the applica- 
tion of the results here developed to specific numerical problems. 


WasuHINneTon. D.C. 





ON A CRITERION FOR THE REJECTION OF OBSERVATIONS AND 
THE DISTRIBUTION OF THE RATIO OF DEVIATION TO 
SAMPLE STANDARD DEVIATION 


By Wriu1aAM R. THompson 


Criteria for the rejection of outlying observations may be designed to reject a 
given fraction of all observations, or a proportion varying with the size of the 
sample. Irwin! has discussed several criteria based on sampling from a normal 
population which had been used previously, as well as one which he proposed. 
This is based on the principal of fixing the expectation of rejecting an observation 
from a sample independently of the aggregate number, N, of the sample. The 
criterion, \, is 1/o times the interval between successive observations in ascending 
order of magnitude, where o is the standard deviation of the sampled population. 
In the same paper he gave, for different values of N, a table of Pi(A) and P2(A), 
respectively probabilities of exceeding given values of \ for the first or second 
such interval from either end. In actual use, however, o is estimated from the 
sample standard deviation, and we are left to decide whether observations in 
question are to be included or not in estimating the standard deviation as also 
whether or not to modify this by addition or subtraction of an estimate of its 
probable error. The object of the present communication is to develop a 
criterion free from defects of this nature, depending only on the assumption of 
random sampling from a normal universe. For this purpose we develop the 
distribution of 7 defined by 


(1) 


where s is the sample standard deviation and 6 is the deviation of an arbitrary 
observation of the sample from the sample mean. This leads to definite criteria, 
which are simple in application. 

Accordingly, consider a sample {x;},7 = 1, --- , N, to be drawn at random 
from a normal population of unknown mean and standard deviation, and that 
the order of enumeration is arbitrary. Then zy is an arbitrary one of the ele- 
ments or observations. Now, let 


N /¥ (x; — %)? 
(2) t= De, =, 


i N 


(3) = tin — 2. 
214 
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Then we will prove that the distribution of tr = 6/s in repeated sampling with a 
fixed aggregate number, N, is given by substitution of 


Vnz=t=Vner/Vn+1—7 


in the z or ¢ distribution of “Student” and R. A. Fisher,? where n = N — 2. 
To this end let N > 2, and let n = N — 2, and 


n+1 n+1 
(4) (n + 1)%: = > zi, and S,(z — %,)? = , » (x; — 4). 
i=1 


t=1 


Obviously, the (n + 1)% + ry = N-%, whence 


(5) ee Ce : whence a 





n+1 $n+1’ n+1 - 
Furthermore, N-s? = S,(x — #1)? + (n + 1) (4% — %)? + (aw — Z)?, whence 
‘ 2 
(6) N.s? = S\(2 — #)? + ST: 
Now, considering the separate samples, {z;},7 = 1, --- , N — 1, and {zy}, 


of aggregate number, N — 1 and 1, respectively; Fisher has shown? that if we 
set 


"7 _ (tw — %))-V 0. n+1 
7) = V/Si(x _— %,)? /" ae 2” 


then, for tf) > 0, the probability, p, that t < t is 


n+1 n+1 
1 r( 2 ) ty e\ 2° 
(8) post yytet Plish) “a. 
Gwe 
2 
and P = 2(i — p) is the probability that |t| > to. 
Now, (5) and (6) in (7) give 


n+2 , 
(9) t= _n+1l gf 33. t-Vn 


, §2 n+2 WSn+1—7’ 
in oo 2) (« sa 4) 
whence 


(10) rats/2te, or Tei = tan 6 = 
n 























| ~ 

= | 
ll 
x 





Accordingly, P is the probability that |7| > 7> = to n+1 


a +6 " 
Thus, if we want to determine 7) so that by rejecting all observations deviat- 
ing from the sample mean by more than s-7» we shall have an average relative 
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frequency of rejections per sample which is fixed, say ¢; then we need only 
to set P = ¢/N. This follows at once from the hypothesis as x is a random 
element of the random sample of N elements drawn from the same normal 
universe (of unknown mean and standard deviation). The criterion of re- 
jection, s-79, is uniquely determined from the sample standard deviation and 




















TABLE I 
7 for given ¢ t for given ¢ 
N a n 
| o=02 0.1 0.05 0.2 0.1 0.05 
3 | 1.40646 1.41228 1.41373 9.51 19.08 38.19 1 
4 1.6454 1.6887 1.7103 4.30 6.20 8.84 2 
5 1.791 1.869 1.917 3.48 4.54 5.84 3 
6 1.895 1.997 2.067 3.19 3.97 4.84 4 
7 1.973 2.093 2.182 3.04 3.68 4.38 5 
8 2.041 2.170 2.274 2.97 3.51 4.12 6 
9 2.099 2.237 2.348 2.93 3.42 3.94 7 
10 2.144 2.295 2.413 2.89 3.36 3.83 8 
Ii 2.190 2.343 2.472 2.88 3.31 3.76 9 
12 2.229 2.388 2.521 2.87 3.28 3.70 10 
13 2.262 2.425 2.567 | 2.86 3.25 3.66 11 
14 2.296 2.463 2.598 | 2.86 3.24 3.60 12 
15 2.325 2.497 2.636 2.86 3.23 3.58 13 
16 2.357 2.522 2.670 2.87 3.21 3.56 14 
iy 2.382 2.553 2.699 2.87 3.21 3.54 15 
18 2.404 2.576 2.733 | 2.87 3.20 3.54 16 
19 2.429 2.601 2.759 | Ze 3.20 3.53 17 
20 2.448 2.625 2.783 2.88 3.20 3.52 18 
21 2.471 2.647 2.800 2.89 3.20 3.50 19 
22 2.487 2.661 2.819 2.89 3.19 3.49 20 
32 2.636 2.819 2.985 2.944 3.216 3.479 30 
42 2.737 2.925 3.093 2.991 3.248 3.489 40 
102 3.047 3.233 3.407 3.182 3.397 3.603 100 
202 3.266 3.448 3.621 3.347 3.546 3.736 200 
502 3.528 3.704 3.872 * 3.569 3.752 3.927 500 
1002 3.714 | 3.881 4.047 | 3.73¢ 3.908 4.078 1000 
P = @¢/N. 


Note: 7 is computed to 0.5 unit in the last place given from the given ¢ which is believed 
correct to 1 unit in the last place. 


number of elements, N, for any prescribed ¢. Dropping the subscript, criti- 
cal values of 7 are given in Table I (together with corresponding values of ¢) 
for @ = 0.2, 0.1, and 0.05 and values of n = N — 2 which should be sufficient 
for most practical purposes. The normal deviate (for unit standard deviation 
and the same P) lies between these values and is approached by 7 and ¢ (in the 
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tabulated range of ¢) from opposite sides as n increases, the approximation to 7 
being the closer of the two. Accordingly Sheppard’s tables may be used with 
good approximation for n > 1000, with ¢/N = P, the probability of exceeding 
numerically the given deviate. They may be used to advantage also in inter- 
polation between » = 100, 1000 by means of differences at the tabulated 
points. 

A crude rejection system where we reject an observation if it deviate from the 
mean of all others by more than a fixed constant times the standard deviation of 
such a difference in terms of o as estimated from the variance of these others by 


~ — ¥,)? : — 
c= a = amounts to taking a fixed value of ¢ as criterion. The 





intention is usually to fix the probability (P) of rejection of observations rather 
than the expectation of rejections per sample (¢); and this, of course, is the 
expected approximate result for large samples. For small samples, however, say 
4 < N < 32, by rejection of observations deviating thus by more than 


To ie. ~y it appears from (7) and Table I that approximately ¢ would 


be fixed rather than P. 

The r-criterion not only affords a precise extension of such a rejection system, 
but also a reduction of the actual process of application to a minimum, with one 
noteworthy exception for the case, N = 3. Here we may use as criterion with 


, . a 
identical effect the ratio, + ; where a1 S 22 S 23, de = X3 — 2, = Xe — 2%, and 
1 


dz = d,;. This order can always be adopted for the test, and it is readily verified 
that 


(11) d,_ V3-t-1 


whence for ¢ = 0.2, 0.1, and 0.05, respectively we have : = 7.74, 16.0, and 32.6. 
1 


Thus, for N = 3, we may take merely the ratio of the greater to the other 
numerical deviation from the median observation as criterion. 


Section 2 


Although not required in connection with the rejection criterion developed 
above, there is a simple generalization of 7 with a closely related distribution 
which may be valuable in somewhat different circumstances. Consider the same 
situation as given above, except that {z;} is divided into two subsets, where 
a7=1,---,N—k, andi=N—k+1, , N, respectively; giving two 
random venation of aggregate number, N — k saa k. Let the means of these be 
¥, and Ze, respectively; and s and & be as before. Then in general let 


(12) 6=%,—# and r= . 
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T(P.N,1) 
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13 .134 .270 
14 .134 - 269 














17 132 . 266 
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22 .130 
23 .130 . 262 
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27 .129 .261 
28 129 .261 
29 129 -260 








-260 


.260 
.260 















.64 
.52 


.440 | .594 
431 .583 
425 575 


413 
4i1 557 
08 554 


550 
404 .548 
.403 547 
402 546 


400 | .543 
399 542 
.398 | .541 
.398 | .541 


.397 | .539 
.397 | .538 
.396 | .538 
.396 | .537 


397 


395 


395 
.394 


3 .832 | 1. 
866 


0 | .693 


-560 


544 





.540 








.537 






.536 
.536 











757 932 
744 -918 
. 734 .907 || 


717 .888 
.713 .884 
710 -881 


. 703 .873 
701 .871 
699 869 


.696 -865 
695 .864 |} 
694 .863 


691 860 
691 | .860 || 
690 | .859 || 






0.4 


s 


144 
.039 


— 


.858 
. 858 











‘- 


260 
212 


— 


125 
lll 
.102 


— et 


-083 
-080 
.076 


— 


.069 
.067 
065 


—_ tee 


.061 
059 
.058 


— 


.056 
.055 
.054 








0.3 















0.2 


_ 


345) 
.386 
.374 


-_ 


.360 
349 
.340 
.334 
.328 


—_ tt 


324 
.320 
317 
314 
312 


— tt 


.310 
.309 
.307 
305 
.304 


—— 


.303 
302 
.301 
.300 
.299 


— 


.299 
.298 
297 
.297 
.296 


—— 






.296 
.295 





0.1 


0 | 1.3968 


.559 


_ 





-640 
644 
.647 


—_ te 





649 
649 
649 


—_— tt 


1 
1.649 
1.649 
1 
1 





649 


649 
649 
649 


ett 


.648 
648 
648 


— i 









.648 








—_— —_— —_— —_ tee 


— 





0.05 


1.4099 
1.6080 


1.757 





814 
848 
.870 
.885 
895 


904 
.910 
915 
.919 
.923 


.926 
-928 
.931 
932 
934 


.936 
937 
938 
940 
941 





942 
942 
943 
944 
944 


945 
945 














0.02 


1.41352 


1.6974 
1.869 


973 
040 
087 
.121 
146 


Nw hw NS 


. 166 
. 183 
.196 
.207 
216 


tS wo & & 


224 
.231 
.237 
242 
247 


te © b& & 


251 
255 
-259 
262 
.264 


mw Ww Ww 





267 
269 
272 
.274 


Nowhw d & 









2.27 




















second. 














(13) 


















whence 


Vv 


(Fe 


Six mai #4)? + So(x = i)? 


is distributed as before for n = ny + no. 


-25335| .38532| .52440| | .67449| .84162|| 1.03643 | 1.28155] 1.64485] 1.95996 
N—k 
Note: ™(p.N x) = —_———. - T(/p. v1) 
k (N —1) 


(nm: + 1) (ne + 1) 


m + Nn + 2 


Obviously, 


N-& = (m1 + Dati + (me + Ike, 










0.01 


1.7147 
1.9175 





2.0509 
2.142 
2.207 
2.256 
2.294 





2.324 
2.348 
2.368 
2.385 
2.399 


2.411 
2.422 
2.432 
2.440 
2.447 








2.454 
2.460 
2.465 
2.470 
2.475 





2.479 
2.483 
2.487 
2.490 
2.493 


2.495 
2.498 





2.32634) 2.57582 


1.414039 


























29 










Further, let mn. + 1 = N — ki ne+ 1 = k, Si(a — %,)? be the sum of squared 
deviations in the first sub-sample and similarly S2(a — #2)? be that for the 
Then Fisher has shown? that the generalized 


Pp eee #1) Vin + m2 
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— (m + 1) Ge — %) _ (m + 1) @ — &) 
as a ae 


and 


(15) Si(a — %1)? + So(a — #2)? = N- 8? — (m+ 1) (4% — %)? — (me +1) (% — 2)? 


m+ 1 
whence 


, n-k - 4 ’ 
(16) targa ee where n = N — 2; 





ie.,t = Vn-tan 0,Vn +2 —k-sind = Vk-r. 


In connection with analysis of variance where the total sample may be divided 
into several subsets of observations, the generalized t may be used, accordingly, 
to indicate in a simple manner which (if any) of the means of subsets differ 
significantly from the general mean where the equivalent t-test is applicable. 

In general let 7(e, 7,4) 2 0 be a number such that P is the probability that 
/t/ > tw,n,k); Where, as above, N is the total number of observations in the 
whole sample, k is the number of these in the subsample and r is defined by (12). 
Then by (16), obviously, 


"7 N —-—k 
(17) T(P,N,k) = / ey tem . 


In Table II are given values of t(,n,1) for a range of values of the arguments, N 
and P. The critical values of 7 in Table I are simply values of this function for 
P = $/N where ¢ is taken as parameter, i.e., Ti/v, v,1)- 

Rider* has given an interesting review of rejection criteria previously proposed. 
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ON CERTAIN COEFFICIENTS USED IN MATHEMATICAL STATISTICS 
By Everetr H. Larauier, S.J. 


I. Introduction 


(1.1) We have studied here certain coefficients arising in interpolation, numeri- 
cal differentiation and integration formulas in order to establish explicit expan- 
sions for these coefficients in the form of a finite summation. Ordinarily they 
are obtained by means of recursion relations, which necessarily demand the 
building up of a complete table in order to find the desired set of coefficients. By 
using the methods described in this paper, we are able to calculate any desired 
set independent of the ones which precede it in the table. In the literature we 
find two other expansions of the difference quotients of zero, one by Jeffery! and 
one by Boole.? Our expansion for the differential quotients of zero is the same as 
one obtained by Jeffery, however the proof is more elementary and simple. 

The Bernoulli numbers also find a wide range of application in many finite 
integration formulas, and hence our attention was drawn to the discussion of 
certain coefficients which occur in the study of these functions.* As in the cases 
mentioned above these coefficients are likewise ordinarily obtained by recursion 
formulas, but by our expansions they may be obtained directly. 


II. Difference Quotients of Zero 


(2.1) It is our purpose here to show that this difference quotient of zero, A” 0", 
may be expressed by the following summation: 


wa 1* E — 7 (3)" (7)" 
oe 2, (. ~ :) m— 2 2 1 (1) 


where 4), do, --- , Am—-1 = 0,1,2,---,n — manda, 2 a22 --- 2 ani 2 0. 


Obviously the number of terms in the summation is the number of combina- 
tions of n — m + 1 things taken m — 1 together where repetitions are allowed. 
(2.2) By means of the recursion relation® 


A”0" = m A”0""! + m An—19r-1 (2) 

1Henry M. Jeffery, ‘‘On a method of expressing the combinations and homogeneous 
products of numbers and their powers by means of differences of nothing.’’ Quarterly 
Journal of Pure and Applied Mathematics, vol. 4 (1861), pp. 364 ff. 

2 George Boole, A Treatise on the Calculus of Finite Differences, (Stechert, N. Y.), p. 20. 

3 Loc. cit. 

4 Steffensen, Interpolation (Williams & Wilkins, Baltimore), p. 125. 

6 LL.M. Milne-Thompson, Calculus of Finite Differences, (Macmillan), p. 36, sec. 2.53, (2). 
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we are able to build up a table of values. By substitution it can be shown that 
(1) satisfies the values of this table except when m = 0, 1 and for m > n, for 
then the summation becomes meaningless. We therefore define the summation 
to have the value 0 for m = 0, > O and for m > n, and the value 1 for m = 1. 
We exhibit one substitution below. When m = 3 and n = 4, 


3\° /2\° 3\° /2\! 3\' /2\! 
mm .. 2) ae = xa = as = = 36 
— 34(3) (i) + a @ +@ (7) } - 
(2.3) Taking (2), we proceed by repeated application of the recursion formula 


and finally we have 


a-1l 


A™0” = mr A™0™ + z=. mr A™— 194 . 


d=m 
which since A”0”™ = m!,§ becomes 
n—1 
Am0" = m™"m!+ > m4 An 0, (3) 
d=m 


We will now prove (1). Proceeding by induction we assume (1) true for 
m—1. Hence from (3) we have 


a7 





Am(y\n — n—m ! n—d a ! m — 1 os are 3 Y - r 
A”0” = m mi+ Dm (m nt) (234 5 i)? 
where @), Q2,---, m2 = 0, 1, 2,---,d — m + 2and a 2 a2 2 > 


Qm-2 = 0. This becomes 


an—-1 


ane = mmm! 4 mt S) met (BIS... BYE". w 


d=m 





Using the symbol == for the double summation of (4), we may write 


n=} 


_ eee Te (3)'(2) & ~ 5) - (3) 4) 
Qydy = 2a" (2=3) 2) \i) T\m—2 2) \i 
m—1\° 3\'/2\! 
? (2=5) - (3) (7) * 
= 1 d—m m — i Gyr" Gy" 
+ m— 2 m—3 a 1 
(2 ie, a. ey" 
*\m—2 ~-e 
n—1 - a Gy Gy 
5 2 Ge “Rg i 











® Milne-Thompson, loc. cit. 
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n—d—1 n—d—1 a n—d 
a “a ot 
n—d—-1 m — 1\"-"-1/m — 2\"-™ 2\n—-m 
12) Gay Gay ~O 
a = 
m—1 m— 2 "MY ; 

: iia n—m m—1 n—m 3 n—m 2 n—m 
Now, m = ("5 :) (* — 9) + 4 ( ‘) , and also d varies 


from m to n — 1. Hence by including m*-™ under the summation we are 
able to replace the double summation by a single one and have 


am—1 am—2 as a 
a mm m — 1 Gr GY 
- m > (574) (2=5) (3) ( 


where 41, de, --- , dmn—1 = 0,1,2,---,n — manda, = a22 -- 
Hence (1) is proved.’ 


m 
m— i 


+ 


4. 


III. Differential Quotients of Zero 


(3.1) In Markoff’s formula for numerical differentiation we meet coefficients 
of the type D”0™. We will show here that this differential quotient of zero 
may be expressed by the following finite sum: 


D0 = (— 1)" m! 2 (pipe + - + Pam) (5) 


where pi > p2 > --+ > Pna—m > O take on values from 1, 2,---,n—1. Obvi- 
ously the number of terms in the expansion will be the same as the number of 
combinations of n — 1 things taken n — m together without repetitions. 

(3.2) By means of the recursion formula® 


D7™O™ = 6 a n) D70%-») + m Dr-1Q(e-b (6) 


we are able to build up a table of values. By substitution it can easily be shown 
that (5) satisfies the values of the table when n > m > 0. For the other values 
the summation is meaningless, hence we define it to have the value 1 for 
m = n > 0;and the value 0 for m > n'and m = 0. When m = 2andn = 4, 
we have 


D0 = (— 1)*? 21 {(3-2) + (3-1) + (2-1)} = 


which is the same value as found by (6). 


7 Our expansion may be shown to be equal to that of Jeffery’s cited in the introduction, 
which is A™0™*" = m! &"0"*", where ¢"0”*" expresses the sum of all the homogeneous products 
of n dimensions which can be formed by the first m natural numbers and their powers. The 
proof of Jeffery’s expansion involves the use of complicated symbolic operators, while our 
proof uses elementary notions only. 

8 Steffensen, op. cit., p. 57, 58, (12) and (14). 
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(3.3) Returning to (6), we obtain by its repeated application: 


n—m—1 
D"70™ = (. 1)" (n a 1)°°-™) D7™0™ 4+ m 7 fx 1)¢ (n a 1) D™-19(»-e-1) 
a=0 


or, since D"0'‘™ = m!, 


n—m— 


1 
D70™ = ee 1)"-™ (n a 1) m!\ +m Z fas 1)2 (n a 1) D—19(2-4-b (7) 
a=0 


In proving (5), we proceed by induction, assuming (5) true for m — 1; hence 
by (7) we have 


D7"0™ = (.. 1)» (n ms 1)¢-» m! 


+m! , ~ (- Ir (n — 1) Di(pype + + + Pama) 


a=0 


(8) 


where pi > po > --- > Pn—m—a > O take the values 1, 2,---,n — a — 2. 
Expanding the double sum of (8) we have 


n—-2 n—3 


>> = uu (pi +++ Pn—m) + dX (n — 1) (pr +++ Pa—m—1) 


oh - (n _ 1) (n _ 2) (p1 2s Da-m-2) (9) 


p= 


feet ES n—-VQn—2--- m4 DD 


as 
in which pi > po > --- > p, > O always holds, where 
s=n—m,n—m-—1,---,2,1 
in turn. 
Upon inspection, it is evident that (9) contains all the terms of (5) with the 
exception of (n — 1) (n — 2)---(m + 1)m. Hence, since by definition 


(n — 1)-™ = (n — 1) --- (m + 1)m, we may include the first term on the 
right-hand side of (8) under the summation and then we have proved (5).° 


IV. The Coefficient G‘’’ 


(4.1) In discussing the Bernoulli numbers and the Bernoulli polynomials, 
Steffensen” makes use of the relation: 


Bz(z) = (— 1)" Ge (10) 


n=0 


—|)»™ (n) 
® Jeffery’s expansion referred to in the introduction is D"0™ = ¢"0, where a 
m! 


expresses the sum of the combinations of the first n — 1 natural numbers taken n — m 
together. The remarks made above under article 2.3 concerning symbolic operators also 
apply here mutatis mutandis. 

19 Op. cit., p. 125, (24); cf. also Jacobi’s theorem. Journal fiir reine und angewandte 
Mathematik (Crelle’s Journal), vol. 12, pp. 268-269. 
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where z = « — x2. We wish here to show that the coefficient G‘’’, ordinarily 
found by means of recursion formulas, may be obtained from the following 
summation: 


Nn+1 N2+1 


r—a2 +1 I 
Gy) = (27) DY INa) Dd IN.-l -- - de, [4] (11) 
Nn=3 Nnai=3 
where [N] = (N)@/(2N)®. Obviously the summation has no meaning for 
= 0, nor forr < n+ 2. Therefore it will be necessary to make definitions 
or devise other schemes for meeting this difficulty. 
Steffensen" shows that 


Gy) =1 for r=0; G,=0 for r>1; (12) 
and likewise he gives the following recursion relation: 
(2r — 2n)™ G) = (2r)@ GO-) + (r —n + 1)? G™,. (13) 


In accordance with (12), we define the sum of (11) to be equal to 1 for n = 0, 
and to be equal to 0 for nm = r — 1, whenr > 1. By means of the recursion 
formula (13), Steffensen” gives a table of values of G‘’, which (11) may be 
easily shown to satisfy. From this table we have the value GS’ = 10. Using 
this as an example of the expansion, we have by (11): 


) = (12)® ai [Ns] SI rl SU M1] 


Ni=3 


= (12) <{3}{{4]((5] + [4] + (3) + BIC4) + 3) 
+ [4] {(5)((6] + [5] + [4] + (3) + [4](15) + [4] + [3]) + BIC4) + [31 }> 
10. 


(4.2) Before proving the general case, we will prove by induction that 


GY = (2r)® Dd IN 


M=3 


Assuming (14) true for r — 1, we have by (12) and (13) 


r—l 


GY? = (27)? DY WM) + 2)? fl = 2)? DO IN. 
Ni=3 Ni=3 
Hence (14) is valid. 
(4.3) We shall prove (11) with respect tor. By repeated application of (15), 
we have 


11 Op. cit., p. 125. 
12 Op. cit., p. 126. 
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GY = {(2r)®/(2r — 2m) AL $ (2) nF 1) (Br — An $2) GUY 
+ {(2r)(r — n + 1)%(r — 2 + 2)©/(2r — Qn + 4) GU 4+... 
$ {AM — $ D® -. = I/O — 2} GE 
+ {7 —n 41)... (r)®/(2r — 2) } GY? 


No+1 


= (27) Yo Na) s+» DD IN 


M=3 


r—atl 


+ (2r)™[r—-n+1]) > [N,l- = 


Nn-1=3 


r—n+2 


(omen epocee FS mua. F wis 


Nn-2 =3 NM=3 


+ 2) fp n+ In 42]. fr —1 x (N,] 


+ (2r)™[r—-n+1]--- fl. 


It is evident from inspection that this is nothing but an expanded form of (11), 
hence (11) is proved with respect to r. 

(4.4) Proceeding in the same way as above to prove induction with respect to 
n, we have again by repeated application of (13) 


GY) = {(r —n + 1)/(2r — 2n)@ GM? + {(2r)@(r — n)/(2r — 2n)@ G5 
+ {(2r)(r — n — 1)/(2r — 2n)®} GY” 
+ bahar + {(2r)o-*-© (3) /(2r — 2n) ners gas" 


= (2r)@ fr — nt) 3 Weal > wi 
+ 2) (r = nh Se Weal «++ SS 
4... 4+ (2r)@ [4] zt [N.-] --- p= [Ni] 
+2) Dy Weal 3 (Ml. 


-1=3 


From this latter equation, (11) follows immediately and therefore the proof is 
complete. 

(4.5) Bernoulli numbers may be expressed in terms of this coefficient G‘”’, as is 
shown by Steffensen,® in the following way 


B= (~17' Go" (15) 


13 Op. cit., p. 125, (27). 
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which we shall express in terms of (11). However as (11) is meaningless for 
n = Tr, we obtain the relation 


(Qr + 2)° GQ) = —(2)?G4) for r>0, (16) 


which follows immediately from (12) and (13), and thereby obviate this difficulty. 
Hence, by (11), (15) and (16), we can write 


= heres =. Wal &. Wale Sw a7) 


We note here that the definitions of the summation, given in 4.1, likewise hold. 


Saint Louis UNIVERSITY 
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NOTICE OF THE ORGANIZATION OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


For sometime there has been a feeling that the theory of statistics would be 
advanced in the United States by the formation of an organization of those per- 
sons especially interested in the mathematical aspects of the subject. As a con- 
sequence, a meeting of interested persons was arranged for September 12, 1935, at 
Ann Arbor, Michigan. At the meeting, it was decided to form an organization 
to be known as the Institute of Mathematical Statistics. A constitution and 
by-laws were adopted and the following officers elected to serve until December 
31st, 1936: President, H. L. Rietz; Vice-president, W. A. Shewhart; Secretary- 
Treasurer, A. T. Craig. A resolution, instructing the officers to investigate the 
feasibility of the affiliation of the Institute with the American Mathematical 
Society or with the American Statistical Association, was adopted. 

The constitution provides that membership in the Institute shall consist of 
Members, Fellows, Honorary Members, and Sustaining Members. A com- 
mittee on membership will establish qualifications requisite for the different 
grades of membership. The annual dues of members and fellows are five dollars 
a year and these include a year’s subscription to the official journal, the Annals 
of Mathematical Statistics. 

The next meeting of the Institute will be held in St. Louis, Missouri, in 
December of this year in connection with the meetings of the American Associa- 
tion for the Advancement of Science, the American Mathematical Society, and 
other organizations. 

Forms for application for membership in the Institute may be had by writing 
the Secretary-Treasurer at the University of Iowa, Iowa City, Iowa. 
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